Arnd Bergmann [Fri, 23 Jun 2023 15:21:55 +0000 (17:21 +0200)]
arm64: ftrace: fix build error with CONFIG_FUNCTION_GRAPH_TRACER=n
It appears that a merge conflict ended up hiding a newly added constant
in some configurations:
arch/arm64/kernel/entry-ftrace.S: Assembler messages:
arch/arm64/kernel/entry-ftrace.S:59: Error: undefined symbol FTRACE_OPS_DIRECT_CALL used as an immediate value
FTRACE_OPS_DIRECT_CALL is still used when CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
is enabled, even if CONFIG_FUNCTION_GRAPH_TRACER is disabled, so change the
ifdef accordingly.
Link: https://lkml.kernel.org/r/20230623152204.2216297-1-arnd@kernel.org
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
Fixes: 3646970322464 ("arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florent Revest <revest@chromium.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Mateusz Stachyra [Tue, 4 Jul 2023 10:27:06 +0000 (12:27 +0200)]
tracing: Fix null pointer dereference in tracing_err_log_open()
Fix an issue in function 'tracing_err_log_open'.
The function doesn't call 'seq_open' if the file is opened only with
write permissions, which results in 'file->private_data' being left as null.
If we then use 'lseek' on that opened file, 'seq_lseek' dereferences
'file->private_data' in 'mutex_lock(&m->lock)', resulting in a kernel panic.
Writing to this node requires root privileges, therefore this bug
has very little security impact.
Tracefs node: /sys/kernel/tracing/error_log
Example Kernel panic:
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000038
Call trace:
mutex_lock+0x30/0x110
seq_lseek+0x34/0xb8
__arm64_sys_lseek+0x6c/0xb8
invoke_syscall+0x58/0x13c
el0_svc_common+0xc4/0x10c
do_el0_svc+0x24/0x98
el0_svc+0x24/0x88
el0t_64_sync_handler+0x84/0xe4
el0t_64_sync+0x1b4/0x1b8
Code:
d503201f aa0803e0 aa1f03e1 aa0103e9 (
c8e97d02)
---[ end trace
561d1b49c12cf8a5 ]---
Kernel panic - not syncing: Oops: Fatal exception
Link: https://lore.kernel.org/linux-trace-kernel/20230703155237eucms1p4dfb6a19caa14c79eb6c823d127b39024@eucms1p4
Link: https://lore.kernel.org/linux-trace-kernel/20230704102706eucms1p30d7ecdcc287f46ad67679fc8491b2e0f@eucms1p3
Cc: stable@vger.kernel.org
Fixes: 8a062902be725 ("tracing: Add tracing error log")
Signed-off-by: Mateusz Stachyra <m.stachyra@samsung.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Donglin Peng [Fri, 23 Jun 2023 07:17:28 +0000 (15:17 +0800)]
tracing: Fix warnings when building htmldocs for function graph retval
When building htmldocs, the following warnings appear:
Documentation/trace/ftrace.rst:2797: WARNING: Literal block expected; none found.
Documentation/trace/ftrace.rst:2816: WARNING: Literal block expected; none found.
So fix it.
Link: https://lore.kernel.org/all/20230623143517.19ffc6c0@canb.auug.org.au/
Link: https://lkml.kernel.org/r/20230623071728.25688-1-pengdonglin@sangfor.com.cn
Fixes: 21c094d3f8a6 ("tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex")
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Donglin Peng [Sat, 8 Apr 2023 12:42:19 +0000 (05:42 -0700)]
riscv: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
The previous patch ("function_graph: Support recording and printing
the return value of function") has laid the groundwork for the for
the funcgraph-retval, and this modification makes it available on
the RISC-V platform.
We introduce a new structure called fgraph_ret_regs for the RISC-V
platform to hold return registers and the frame pointer. We then
fill its content in the return_to_handler and pass its address to
the function ftrace_return_to_handler to record the return value.
Link: https://lore.kernel.org/linux-trace-kernel/a8d71b12259f90e7e63d0ea654fcac95b0232bbc.1680954589.git.pengdonglin@sangfor.com.cn
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Azeem Shaikh [Tue, 13 Jun 2023 00:41:25 +0000 (00:41 +0000)]
tracing/boot: Replace strlcpy with strscpy
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
Direct replacement is safe here since return value of -E2BIG
is used to check for truncation instead of sizeof(dest).
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Link: https://lore.kernel.org/linux-trace-kernel/20230613004125.3539934-1-azeemshaikh38@gmail.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Daniel Bristot de Oliveira [Tue, 6 Jun 2023 15:12:27 +0000 (17:12 +0200)]
tracing/timerlat: Add user-space interface
Going a step further, we propose a way to use any user-space
workload as the task waiting for the timerlat timer. This is done
via a per-CPU file named osnoise/cpu$id/timerlat_fd file.
The tracef_fd allows a task to open at a time. When a task reads
the file, the timerlat timer is armed for future osnoise/timerlat_period_us
time. When the timer fires, it prints the IRQ latency and
wakes up the user-space thread waiting in the timerlat_fd.
The thread then starts to run, executes the timerlat measurement, prints
the thread scheduling latency and returns to user-space.
When the thread rereads the timerlat_fd, the tracer will print the
user-ret(urn) latency, which is an additional metric.
This additional metric is also traced by the tracer and can be used, for
example of measuring the context switch overhead from kernel-to-user and
user-to-kernel, or the response time for an arbitrary execution in
user-space.
The tracer supports one thread per CPU, the thread must be pinned to
the CPU, and it cannot migrate while holding the timerlat_fd. The reason
is that the tracer is per CPU (nothing prohibits the tracer from
allowing migrations in the future). The tracer monitors the migration
of the thread and disables the tracer if detected.
The timerlat_fd is only available for opening/reading when timerlat
tracer is enabled, and NO_OSNOISE_WORKLOAD is set.
The simplest way to activate this feature from user-space is:
-------------------------------- %< -----------------------------------
int main(void)
{
char buffer[1024];
int timerlat_fd;
int retval;
long cpu = 0; /* place in CPU 0 */
cpu_set_t set;
CPU_ZERO(&set);
CPU_SET(cpu, &set);
if (sched_setaffinity(gettid(), sizeof(set), &set) == -1)
return 1;
snprintf(buffer, sizeof(buffer),
"/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd",
cpu);
timerlat_fd = open(buffer, O_RDONLY);
if (timerlat_fd < 0) {
printf("error opening %s: %s\n", buffer, strerror(errno));
exit(1);
}
for (;;) {
retval = read(timerlat_fd, buffer, 1024);
if (retval < 0)
break;
}
close(timerlat_fd);
exit(0);
}
-------------------------------- >% -----------------------------------
When disabling timerlat, if there is a workload holding the timerlat_fd,
the SIGKILL will be sent to the thread.
Link: https://lkml.kernel.org/r/69fe66a863d2792ff4c3a149bf9e32e26468bb3a.1686063934.git.bristot@kernel.org
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: William White <chwhite@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Daniel Bristot de Oliveira [Tue, 6 Jun 2023 15:12:26 +0000 (17:12 +0200)]
tracing/osnoise: Skip running osnoise if all instances are off
In the case of all tracing instances being off, sleep for the entire
period.
Q: Why not kill all threads so?
A: It is valid and useful to start the threads with tracing off.
For example, rtla disables tracing, starts the tracer, applies the
scheduling setup to the threads, e.g., sched priority and cgroup,
and then begin tracing with all set.
Skipping the period helps to speed up rtla setup and save the
trace after a stop tracing.
Link: https://lkml.kernel.org/r/aa4dd9b7e76fcb63901fe5407e15ec002b318599.1686063934.git.bristot@kernel.org
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: William White <chwhite@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Daniel Bristot de Oliveira [Tue, 6 Jun 2023 15:12:25 +0000 (17:12 +0200)]
tracing/osnoise: Switch from PF_NO_SETAFFINITY to migrate_disable
Currently, osnoise/timerlat threads run with PF_NO_SETAFFINITY set.
It works well, however, cgroups do not allow PF_NO_SETAFFINITY threads
to be accepted, and this creates a limitation to osnoise/timerlat.
To avoid this limitation, disable migration of the threads as soon
as they start to run, and then clean the PF_NO_SETAFFINITY flag (still)
used during thread creation.
If for some reason a thread migration is requested, e.g., via
sched_settafinity, the tracer thread will notice and exit.
Link: https://lkml.kernel.org/r/8ba8bc9c15b3ea40cf73cf67a9bc061a264609f0.1686063934.git.bristot@kernel.org
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: William White <chwhite@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Jiri Olsa [Sun, 11 Jun 2023 13:00:29 +0000 (15:00 +0200)]
ftrace: Show all functions with addresses in available_filter_functions_addrs
Adding new available_filter_functions_addrs file that shows all available
functions (same as available_filter_functions) together with addresses,
like:
# cat available_filter_functions_addrs | head
ffffffff81000770 __traceiter_initcall_level
ffffffff810007c0 __traceiter_initcall_start
ffffffff81000810 __traceiter_initcall_finish
ffffffff81000860 trace_initcall_finish_cb
...
Note displayed address is the patch-site address and can differ from
/proc/kallsyms address.
It's useful to have address avilable for traceable symbols, so we don't
need to allways cross check kallsyms with available_filter_functions
(or the other way around) and have all the data in single file.
For backwards compatibility reasons we can't change the existing
available_filter_functions file output, but we need to add new file.
The problem is that we need to do 2 passes:
- through available_filter_functions and find out if the function is traceable
- through /proc/kallsyms to get the address for traceable function
Having available_filter_functions symbols together with addresses allow
us to skip the kallsyms step and we are ok with the address in
available_filter_functions_addr not being the function entry, because
kprobe_multi uses fprobe and that handles both entry and patch-site
address properly.
We have 2 interfaces how to create kprobe_multi link:
a) passing symbols to kernel
1) user gathers symbols and need to ensure that they are
trace-able -> pass through available_filter_functions file
2) kernel takes those symbols and translates them to addresses
through kallsyms api
3) addresses are passed to fprobe/ftrace through:
register_fprobe_ips
-> ftrace_set_filter_ips
b) passing addresses to kernel
1) user gathers symbols and needs to ensure that they are
trace-able -> pass through available_filter_functions file
2) user takes those symbols and translates them to addresses
through /proc/kallsyms
3) addresses are passed to the kernel and kernel calls:
register_fprobe_ips
-> ftrace_set_filter_ips
The new available_filter_functions_addrs file helps us with option b),
because we can make 'b 1' and 'b 2' in one step - while filtering traceable
functions, we get the address directly.
Link: https://lore.kernel.org/linux-trace-kernel/20230611130029.1202298-1-jolsa@kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Tested-by: Jackie Liu <liuyun01@kylinos.cn> # x86
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Donglin Peng [Sat, 8 Apr 2023 12:42:22 +0000 (05:42 -0700)]
selftests/ftrace: Add funcgraph-retval test case
Add a test case for the funcgraph-retval and funcgraph-retval-hex
trace options.
Link: https://lkml.kernel.org/r/9fedbd25e63f012cade5dad13be21225fec2fb5d.1680954589.git.pengdonglin@sangfor.com.cn
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Donglin Peng [Sat, 8 Apr 2023 12:42:21 +0000 (05:42 -0700)]
LoongArch: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
The previous patch ("function_graph: Support recording and printing
the return value of function") has laid the groundwork for the for
the funcgraph-retval, and this modification makes it available on
the LoongArch platform.
We introduce a new structure called fgraph_ret_regs for the LoongArch
platform to hold return registers and the frame pointer. We then fill
its content in the return_to_handler and pass its address to the
function ftrace_return_to_handler to record the return value.
Link: https://lkml.kernel.org/r/c5462255e435fab363895c2d7433bc0f5a140411.1680954589.git.pengdonglin@sangfor.com.cn
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Donglin Peng [Sat, 8 Apr 2023 12:42:20 +0000 (05:42 -0700)]
x86/ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
The previous patch ("function_graph: Support recording and printing
the return value of function") has laid the groundwork for the for
the funcgraph-retval, and this modification makes it available on
the x86 platform.
We introduce a new structure called fgraph_ret_regs for the x86
platform to hold return registers and the frame pointer. We then
fill its content in the return_to_handler and pass its address
to the function ftrace_return_to_handler to record the return
value.
Link: https://lkml.kernel.org/r/53a506f0f18ff4b7aeb0feb762f1c9a5e9b83ee9.1680954589.git.pengdonglin@sangfor.com.cn
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Donglin Peng [Sat, 8 Apr 2023 12:42:18 +0000 (05:42 -0700)]
arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
The previous patch ("function_graph: Support recording and printing
the return value of function") has laid the groundwork for the for
the funcgraph-retval, and this modification makes it available on
the ARM64 platform.
We introduce a new structure called fgraph_ret_regs for the ARM64
platform to hold return registers and the frame pointer. We then
fill its content in the return_to_handler and pass its address to
the function ftrace_return_to_handler to record the return value.
Link: https://lkml.kernel.org/r/c78366416ce93f704ae7000c4ee60eb4258c38f7.1680954589.git.pengdonglin@sangfor.com.cn
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Donglin Peng [Sat, 8 Apr 2023 12:42:16 +0000 (05:42 -0700)]
tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex
Add documentation for the two newly introduced options for the
function_graph tracer. The funcgraph-retval option is used to
control whether or not to display the return value, while the
funcgraph-retval-hex option is used to control the display
format of the return value.
Link: https://lkml.kernel.org/r/2b5635f05146161b54c9ea6307e25efe5ccebdad.1680954589.git.pengdonglin@sangfor.com.cn
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Donglin Peng [Sat, 8 Apr 2023 12:42:15 +0000 (05:42 -0700)]
function_graph: Support recording and printing the return value of function
Analyzing system call failures with the function_graph tracer can be a
time-consuming process, particularly when locating the kernel function
that first returns an error in the trace logs. This change aims to
simplify the process by recording the function return value to the
'retval' member of 'ftrace_graph_ret' and printing it when outputting
the trace log.
We have introduced new trace options: funcgraph-retval and
funcgraph-retval-hex. The former controls whether to display the return
value, while the latter controls the display format.
Please note that even if a function's return type is void, a return
value will still be printed. You can simply ignore it.
This patch only establishes the fundamental infrastructure. Subsequent
patches will make this feature available on some commonly used processor
architectures.
Here is an example:
I attempted to attach the demo process to a cpu cgroup, but it failed:
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
-bash: echo: write error: Invalid argument
The strace logs indicate that the write system call returned -EINVAL(-22):
...
write(1, "273\n", 4) = -1 EINVAL (Invalid argument)
...
To capture trace logs during a write system call, use the following
commands:
cd /sys/kernel/debug/tracing/
echo 0 > tracing_on
echo > trace
echo *sys_write > set_graph_function
echo *spin* > set_graph_notrace
echo *rcu* >> set_graph_notrace
echo *alloc* >> set_graph_notrace
echo preempt* >> set_graph_notrace
echo kfree* >> set_graph_notrace
echo $$ > set_ftrace_pid
echo function_graph > current_tracer
echo 1 > options/funcgraph-retval
echo 0 > options/funcgraph-retval-hex
echo 1 > tracing_on
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
echo 0 > tracing_on
cat trace > ~/trace.log
To locate the root cause, search for error code -22 directly in the file
trace.log and identify the first function that returned -22. Once you
have identified this function, examine its code to determine the root
cause.
For example, in the trace log below, cpu_cgroup_can_attach
returned -22 first, so we can focus our analysis on this function to
identify the root cause.
...
1) | cgroup_migrate() {
1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
1) | cgroup_migrate_execute() {
1) | cpu_cgroup_can_attach() {
1) | cgroup_taskset_first() {
1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
1) 4.369 us | } /* cgroup_migrate_execute = -22 */
1) 7.143 us | } /* cgroup_migrate = -22 */
...
Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn
Tested-by: Florian Kauer <florian.kauer@linutronix.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt (Google) [Tue, 20 Jun 2023 22:34:55 +0000 (18:34 -0400)]
fgraph: Add declaration of "struct fgraph_ret_regs"
In final testing of:
https://patchwork.kernel.org/project/linux-trace-kernel/patch/
1fc502712c981e0e6742185ba242992170ac9da8.
1680954589.git.pengdonglin@sangfor.com.cn/
"function_graph: Support recording and printing the return value of function"
The test failed due to a new warning found in the build:
kernel/trace/fgraph.c:243:56: warning: ‘struct fgraph_ret_regs’ declared inside parameter list will not be visible outside of this definition or declaration
Instead of asking to send another patch series, just add it and then apply
the updates.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Sun, 4 Jun 2023 18:04:27 +0000 (14:04 -0400)]
Linux 6.4-rc5
Linus Torvalds [Sun, 4 Jun 2023 15:57:38 +0000 (11:57 -0400)]
Merge tag 'irq_urgent_for_v6.4_rc5' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
- Fix open firmware quirks validation so that they don't get applied
wrongly
* tag 'irq_urgent_for_v6.4_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic: Correctly validate OF quirk descriptors
Linus Torvalds [Sun, 4 Jun 2023 13:10:43 +0000 (09:10 -0400)]
Merge tag 'media/v6.4-4' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Some driver fixes:
- a regression fix for the verisilicon driver
- uvcvideo: don't expose unsupported video formats to userspace
- camss-video: don't zero subdev format after init
- mediatek: some fixes for 4K decoder formats
- fix a Sphinx build warning (missing doc for client_caps)
- some fixes for imx and atomisp staging drivers
And two CEC core fixes:
- don't set last_initiator if TX in progress
- disable adapter in cec_devnode_unregister"
* tag 'media/v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: uvcvideo: Don't expose unsupported formats to userspace
media: v4l2-subdev: Fix missing kerneldoc for client_caps
media: staging: media: imx: initialize hs_settle to avoid warning
media: v4l2-mc: Drop subdev check in v4l2_create_fwnode_links_to_pad()
media: staging: media: atomisp: init high & low vars
media: cec: core: don't set last_initiator if tx in progress
media: cec: core: disable adapter in cec_devnode_unregister
media: mediatek: vcodec: Only apply 4K frame sizes on decoder formats
media: camss: camss-video: Don't zero subdev format again after initialization
media: verisilicon: Additional fix for the crash when opening the driver
Linus Torvalds [Sun, 4 Jun 2023 12:32:30 +0000 (08:32 -0400)]
Merge tag 'char-misc-6.4-rc5' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a bunch of tiny char/misc/other driver fixes for 6.4-rc5 that
resolve a number of reported issues. Included in here are:
- iio driver fixes
- fpga driver fixes
- test_firmware bugfixes
- fastrpc driver tiny bugfixes
- MAINTAINERS file updates for some subsystems
All of these have been in linux-next this past week with no reported
issues"
* tag 'char-misc-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (34 commits)
test_firmware: fix the memory leak of the allocated firmware buffer
test_firmware: fix a memory leak with reqs buffer
test_firmware: prevent race conditions by a correct implementation of locking
firmware_loader: Fix a NULL vs IS_ERR() check
MAINTAINERS: Vaibhav Gupta is the new ipack maintainer
dt-bindings: fpga: replace Ivan Bornyakov maintainership
MAINTAINERS: update Microchip MPF FPGA reviewers
misc: fastrpc: reject new invocations during device removal
misc: fastrpc: return -EPIPE to invocations on device removal
misc: fastrpc: Reassign memory ownership only for remote heap
misc: fastrpc: Pass proper scm arguments for secure map request
iio: imu: inv_icm42600: fix timestamp reset
iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY flag
dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476 compatible value
iio: dac: mcp4725: Fix i2c_master_send() return value handling
iio: accel: kx022a fix irq getting
iio: bu27034: Ensure reset is written
iio: dac: build ad5758 driver when AD5758 is selected
iio: addac:
ad74413: fix resistance input processing
iio: light: vcnl4035: fixed chip ID check
...
Linus Torvalds [Sun, 4 Jun 2023 12:02:25 +0000 (08:02 -0400)]
Merge tag 'driver-core-6.4-rc5' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are two small driver core cacheinfo fixes for 6.4-rc5 that
resolve a number of reported issues with that file. These changes have
been in linux-next this past week with no reported problems"
* tag 'driver-core-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers: base: cacheinfo: Update cpu_map_populated during CPU Hotplug
drivers: base: cacheinfo: Fix shared_cpu_map changes in event of CPU hotplug
Linus Torvalds [Sun, 4 Jun 2023 11:51:33 +0000 (07:51 -0400)]
Merge tag 'tty-6.4-rc5' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty/serial driver fixes for 6.4-rc5 that have all
been in linux-next this past week with no reported problems. Included
in here are:
- 8250_tegra driver bugfix
- fsl uart driver bugfixes
- Kconfig fix for dependancy issue
- dt-bindings fix for the 8250_omap driver"
* tag 'tty-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
dt-bindings: serial: 8250_omap: add rs485-rts-active-high
serial: cpm_uart: Fix a COMPILE_TEST dependency
soc: fsl: cpm1: Fix TSA and QMC dependencies in case of COMPILE_TEST
tty: serial: fsl_lpuart: use UARTCTRL_TXINV to send break instead of UARTCTRL_SBK
serial: 8250_tegra: Fix an error handling path in tegra_uart_probe()
Linus Torvalds [Sun, 4 Jun 2023 11:31:48 +0000 (07:31 -0400)]
Merge tag 'usb-6.4-rc5' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB driver and core fixes for 6.4-rc5. Most of these are
tiny driver fixes, including:
- udc driver bugfix
- f_fs gadget driver bugfix
- cdns3 driver bugfix
- typec bugfixes
But the "big" thing in here is a fix yet-again for how the USB buffers
are handled from userspace when dealing with DMA issues. The changes
were discussed a lot, and tested a lot, on the list, and acked by the
relevant mm maintainers and have been in linux-next all this past week
with no reported problems"
* tag 'usb-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tps6598x: Fix broken polling mode after system suspend/resume
mm: page_table_check: Ensure user pages are not slab pages
mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM
usb: usbfs: Use consistent mmap functions
usb: usbfs: Enforce page requirements for mmap
dt-bindings: usb: snps,dwc3: Fix "snps,hsphy_interface" type
usb: gadget: udc: fix NULL dereference in remove()
usb: gadget: f_fs: Add unbind event before functionfs_unbind
usb: cdns3: fix NCM gadget RX speed 20x slow than expection at iMX8QM
Linus Torvalds [Sun, 4 Jun 2023 11:16:53 +0000 (07:16 -0400)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Address some fallout of the locking rework, this time affecting the
way the vgic is configured
- Fix an issue where the page table walker frees a subtree and then
proceeds with walking what it has just freed...
- Check that a given PA donated to the guest is actually memory (only
affecting pKVM)
- Correctly handle MTE CMOs by Set/Way
- Fix the reported address of a watchpoint forwarded to userspace
- Fix the freeing of the root of stage-2 page tables
- Stop creating spurious PMU events to perform detection of the
default PMU and use the existing PMU list instead
x86:
- Fix a memslot lookup bug in the NX recovery thread that could
theoretically let userspace bypass the NX hugepage mitigation
- Fix a s/BLOCKING/PENDING bug in SVM's vNMI support
- Account exit stats for fastpath VM-Exits that never leave the super
tight run-loop
- Fix an out-of-bounds bug in the optimized APIC map code, and add a
regression test for the race"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Add test for race in kvm_recalculate_apic_map()
KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds
KVM: x86: Account fastpath-only VM-Exits in vCPU stats
KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker
KVM: arm64: Document default vPMU behavior on heterogeneous systems
KVM: arm64: Iterate arm_pmus list to probe for default PMU
KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()
KVM: arm64: Populate fault info for watchpoint
KVM: arm64: Reload PTE after invoking walker callback on preorder traversal
KVM: arm64: Handle trap of tagged Set/Way CMOs
arm64: Add missing Set/Way CMO encodings
KVM: arm64: Prevent unconditional donation of unmapped regions from the host
KVM: arm64: vgic: Fix a comment
KVM: arm64: vgic: Fix locking comment
KVM: arm64: vgic: Wrap vgic_its_create() with config_lock
KVM: arm64: vgic: Fix a circular locking issue
Linus Torvalds [Sun, 4 Jun 2023 11:11:13 +0000 (07:11 -0400)]
Merge tag 'powerpc-6.4-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix link errors in new aes-gcm-p10 code when built-in with other
drivers
- Limit number of TCEs passed to H_STUFF_TCE hcall as per spec
- Use KSYM_NAME_LEN in xmon array size to avoid possible OOB write
Thanks to Gaurav Batra and Maninder Singh Vishal Chourasia.
* tag 'powerpc-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/xmon: Use KSYM_NAME_LEN in array size
powerpc/iommu: Limit number of TCEs to 512 for H_STUFF_TCE hcall
powerpc/crypto: Fix aes-gcm-p10 link errors
Paolo Bonzini [Sat, 3 Jun 2023 19:16:58 +0000 (15:16 -0400)]
Merge tag 'kvm-x86-fixes-6.4' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 6.4
- Fix a memslot lookup bug in the NX recovery thread that could
theoretically let userspace bypass the NX hugepage mitigation
- Fix a s/BLOCKING/PENDING bug in SVM's vNMI support
- Account exit stats for fastpath VM-Exits that never leave the super
tight run-loop
- Fix an out-of-bounds bug in the optimized APIC map code, and add a
regression test for the race.
Paolo Bonzini [Sat, 3 Jun 2023 19:15:49 +0000 (15:15 -0400)]
Merge tag 'kvmarm-fixes-6.4-3' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.4, take #3
- Fix the reported address of a watchpoint forwarded to userspace
- Fix the freeing of the root of stage-2 page tables
- Stop creating spurious PMU events to perform detection of the
default PMU and use the existing PMU list instead.
Paolo Bonzini [Sat, 3 Jun 2023 19:14:18 +0000 (15:14 -0400)]
Merge tag 'kvmarm-fixes-6.4-2' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.4, take #2
- Address some fallout of the locking rework, this time affecting
the way the vgic is configured
- Fix an issue where the page table walker frees a subtree and
then proceeds with walking what it has just freed...
- Check that a given PA donated to the gues is actually memory
(only affecting pKVM)
- Correctly handle MTE CMOs by Set/Way
Linus Torvalds [Sat, 3 Jun 2023 17:52:24 +0000 (13:52 -0400)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five fixes, all in drivers.
The most extensive is the target change to fix the hang in the login
code, which involves changing timers from per login to per connection"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: stex: Fix gcc 13 warnings
scsi: qla2xxx: Fix NULL pointer dereference in target mode
scsi: target: iscsi: Prevent login threads from racing between each other
scsi: target: iscsi: Remove unused transport_timer
scsi: target: iscsi: Fix hang in the iSCSI login code
Linus Torvalds [Sat, 3 Jun 2023 17:46:11 +0000 (13:46 -0400)]
Merge tag 'leds-6.4-rc5' of git://git./linux/kernel/git/johan/linux
Pull LED fix from Johan Hovold:
"Here's a fix for a regression in 6.4-rc1 which broke the backlight on
machines such as the Lenovo ThinkPad X13s"
Acked-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/lkml/20230602091928.GR449117@google.com/
* tag 'leds-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/linux:
leds: qcom-lpg: Fix PWM period limits
Bjorn Andersson [Mon, 15 May 2023 16:26:04 +0000 (09:26 -0700)]
leds: qcom-lpg: Fix PWM period limits
The introduction of high resolution PWM support changed the order of the
operations in the calculation of min and max period. The result in both
divisions is in most cases a truncation to 0, which limits the period to
the range of [0, 0].
Both numerators (and denominators) are within 64 bits, so the whole
expression can be put directly into the div64_u64, instead of doing it
partially.
Fixes: b00d2ed37617 ("leds: rgb: leds-qcom-lpg: Add support for high resolution PWM")
Reviewed-by: Caleb Connolly <caleb.connolly@linaro.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Acked-by: Lee Jones <lee@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Link: https://lore.kernel.org/r/20230515162604.649203-1-quic_bjorande@quicinc.com
Signed-off-by: Johan Hovold <johan@kernel.org>
Linus Torvalds [Sat, 3 Jun 2023 12:23:16 +0000 (08:23 -0400)]
Merge tag 'probes-fixes-6.4-rc4' of git://git./linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- Return NULL if the trace_probe list on trace_probe_event is empty
- selftests/ftrace: Choose testing symbol name for filtering feature
from sample data instead of fixed symbol
* tag 'probes-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
selftests/ftrace: Choose target function for filter test from samples
tracing/probe: trace_probe_primary_from_call(): checked list_first_entry
Masami Hiramatsu (Google) [Sun, 19 Mar 2023 02:53:32 +0000 (11:53 +0900)]
selftests/ftrace: Choose target function for filter test from samples
Since the event-filter-function.tc expects the 'exit_mmap()' directly
calls 'kmem_cache_free()', this is vulnerable to code modifications.
Choose the target function for the filter test from the sample
event data so that it can keep test running correctly even if the caller
function name will be changed.
Link: https://lore.kernel.org/linux-trace-kernel/167919441260.1922645.18355804179347364057.stgit@mhiramat.roam.corp.google.com/
Link: https://lore.kernel.org/all/CA+G9fYtF-XEKi9YNGgR=Kf==7iRb2FrmEC7qtwAeQbfyah-UhA@mail.gmail.com/
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Fixes: 7f09d639b8c4 ("tracing/selftests: Add test for event filtering on function name")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Michal Luczaj [Fri, 2 Jun 2023 23:32:50 +0000 (16:32 -0700)]
KVM: selftests: Add test for race in kvm_recalculate_apic_map()
Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during
APIC map construction to hunt for TOCTOU bugs in KVM. KVM's optimized map
recalc makes multiple passes over the list of vCPUs, and the calculations
ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where
toggling LAPIC_MODE_DISABLED is quite interesting.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230602233250.1014316-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Sean Christopherson [Fri, 2 Jun 2023 23:32:48 +0000 (16:32 -0700)]
KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds
Bail from kvm_recalculate_phys_map() and disable the optimized map if the
target vCPU's x2APIC ID is out-of-bounds, i.e. if the vCPU was added
and/or enabled its local APIC after the map was allocated. This fixes an
out-of-bounds access bug in the !x2apic_format path where KVM would write
beyond the end of phys_map.
Check the x2APIC ID regardless of whether or not x2APIC is enabled,
as KVM's hardcodes x2APIC ID to be the vCPU ID, i.e. it can't change, and
the map allocation in kvm_recalculate_apic_map() doesn't check for x2APIC
being enabled, i.e. the check won't get false postivies.
Note, this also affects the x2apic_format path, which previously just
ignored the "x2apic_id > new->max_apic_id" case. That too is arguably a
bug fix, as ignoring the vCPU meant that KVM would not send interrupts to
the vCPU until the next map recalculation. In practice, that "bug" is
likely benign as a newly present vCPU/APIC would immediately trigger a
recalc. But, there's no functional downside to disabling the map, and
a future patch will gracefully handle the -E2BIG case by retrying instead
of simply disabling the optimized map.
Opportunistically add a sanity check on the xAPIC ID size, along with a
comment explaining why the xAPIC ID is guaranteed to be "good".
Reported-by: Michal Luczaj <mhal@rbox.co>
Fixes: 5b84b0291702 ("KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602233250.1014316-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Sean Christopherson [Fri, 2 Jun 2023 01:19:19 +0000 (18:19 -0700)]
KVM: x86: Account fastpath-only VM-Exits in vCPU stats
Increment vcpu->stat.exits when handling a fastpath VM-Exit without
going through any part of the "slow" path. Not bumping the exits stat
can result in wildly misleading exit counts, e.g. if the primary reason
the guest is exiting is to program the TSC deadline timer.
Fixes: 404d5d7bff0d ("KVM: X86: Introduce more exit_fastpath_completion enum values")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602011920.787844-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Maciej S. Szmigiero [Fri, 19 May 2023 11:26:18 +0000 (13:26 +0200)]
KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware
I noticed that with vCPU count large enough (> 16) they sometimes froze at
boot.
With vCPU count of 64 they never booted successfully - suggesting some kind
of a race condition.
Since adding "vnmi=0" module parameter made these guests boot successfully
it was clear that the problem is most likely (v)NMI-related.
Running kvm-unit-tests quickly showed failing NMI-related tests cases, like
"multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests
and the NMI parts of eventinj test.
The issue was that once one NMI was being serviced no other NMI was allowed
to be set pending (NMI limit = 0), which was traced to
svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather
than for the "NMI pending" flag.
Fix this by testing for the right flag in svm_is_vnmi_pending().
Once this is done, the NMI-related kvm-unit-tests pass successfully and
the Windows guest no longer freezes at boot.
Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Sean Christopherson [Fri, 2 Jun 2023 01:01:37 +0000 (18:01 -0700)]
KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker
Factor in the address space (non-SMM vs. SMM) of the target shadow page
when recovering potential NX huge pages, otherwise KVM will retrieve the
wrong memslot when zapping shadow pages that were created for SMM. The
bug most visibly manifests as a WARN on the memslot being non-NULL, but
the worst case scenario is that KVM could unaccount the shadow page
without ensuring KVM won't install a huge page, i.e. if the non-SMM slot
is being dirty logged, but the SMM slot is not.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 3911 at arch/x86/kvm/mmu/mmu.c:7015
kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm]
CPU: 1 PID: 3911 Comm: kvm-nx-lpage-re
RIP: 0010:kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm]
RSP: 0018:
ffff99b284f0be68 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
ffff99b284edd000 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
ffff9271397024e0 R08:
0000000000000000 R09:
ffff927139702450
R10:
0000000000000000 R11:
0000000000000001 R12:
ffff99b284f0be98
R13:
0000000000000000 R14:
ffff9270991fcd80 R15:
0000000000000003
FS:
0000000000000000(0000) GS:
ffff927f9f640000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f0aacad3ae0 CR3:
000000088fc2c005 CR4:
00000000003726e0
Call Trace:
<TASK>
__pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm]
kvm_vm_worker_thread+0x106/0x1c0 [kvm]
kthread+0xd9/0x100
ret_from_fork+0x2c/0x50
</TASK>
---[ end trace
0000000000000000 ]---
This bug was exposed by commit
edbdb43fc96b ("KVM: x86: Preserve TDP MMU
roots until they are explicitly invalidated"), which allowed KVM to retain
SMM TDP MMU roots effectively indefinitely. Before commit
edbdb43fc96b,
KVM would zap all SMM TDP MMU roots and thus all SMM TDP MMU shadow pages
once all vCPUs exited SMM, which made the window where this bug (recovering
an SMM NX huge page) could be encountered quite tiny. To hit the bug, the
NX recovery thread would have to run while at least one vCPU was in SMM.
Most VMs typically only use SMM during boot, and so the problematic shadow
pages were gone by the time the NX recovery thread ran.
Now that KVM preserves TDP MMU roots until they are explicitly invalidated
(e.g. by a memslot deletion), the window to trigger the bug is effectively
never closed because most VMMs don't delete memslots after boot (except
for a handful of special scenarios).
Fixes: eb298605705a ("KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages")
Reported-by: Fabio Coatti <fabio.coatti@gmail.com>
Closes: https://lore.kernel.org/all/CADpTngX9LESCdHVu_2mQkNGena_Ng2CphWNwsRGSMxzDsTjU2A@mail.gmail.com
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602010137.784664-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Lino Sanfilippo [Tue, 30 May 2023 16:41:16 +0000 (18:41 +0200)]
tpm, tpm_tis: correct tpm_tis_flags enumeration values
With commit
858e8b792d06 ("tpm, tpm_tis: Avoid cache incoherency in test
for interrupts") bit accessor functions are used to access flags in
tpm_tis_data->flags.
However these functions expect bit numbers, while the flags are defined
as bit masks in enum tpm_tis_flag.
Fix this inconsistency by using numbers instead of masks also for the
flags in the enum.
Reported-by: Pavel Machek <pavel@denx.de>
Fixes: 858e8b792d06 ("tpm, tpm_tis: Avoid cache incoherency in test for interrupts")
Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Cc: stable@vger.kernel.org
Reviewed-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 2 Jun 2023 21:25:22 +0000 (17:25 -0400)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fix from Ted Ts'o:
"Fix an ext4 regression which landed during the 6.4 merge window"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"
Linus Torvalds [Fri, 2 Jun 2023 21:16:19 +0000 (17:16 -0400)]
Merge tag 'for-6.4-rc4-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One regression fix.
The rewrite of scrub code in 6.4 broke device replace in zoned mode,
some of the writes could happen out of order so this had to be
adjusted for all cases"
* tag 'for-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: fix dev-replace after the scrub rework
Ojaswin Mujoo [Tue, 30 May 2023 12:33:39 +0000 (18:03 +0530)]
Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"
This reverts commit
32c0869370194ae5ac9f9f501953ef693040f6a1.
The reverted commit was intended to remove a dead check however it was observed
that this check was actually being used to exit early instead of looping
sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
the goal extent. Due to this, a my performance tests (fsmark, parallel file
writes in a highly fragmented FS) were seeing a 2x-3x regression.
Example, the default value of the following variables is:
sbi->s_mb_max_to_scan = 200
sbi->s_mb_min_to_scan = 10
In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
early and try again. This loop will go on until we have processed
sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
just use whatever we have even if it is smaller than goal extent.
Now, the regression comes when we find an extent bigger than goal. Earlier, in
this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
the bigger extent. However with commit
32c08693 that check was removed and hence
we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
free extent to satisfy the request. The only time we would exit early would be
when the free extent is *exactly* the size of our goal, which is pretty uncommon
occurrence and so we would almost always end up looping 200 times.
Hence, revert the commit by adding the check back to fix the regression. Also
add a comment to outline this policy.
Fixes: 32c086937019 ("ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits")
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/ddcae9658e46880dfec2fb0aa61d01fb3353d202.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Laurent Pinchart [Thu, 20 Apr 2023 09:45:59 +0000 (10:45 +0100)]
media: uvcvideo: Don't expose unsupported formats to userspace
When the uvcvideo driver encounters a format descriptor with an unknown
format GUID, it creates a corresponding struct uvc_format instance with
the fcc field set to 0. Since commit
50459f103edf ("media: uvcvideo:
Remove format descriptions"), the driver relies on the V4L2 core to
provide the format description string, which the V4L2 core can't do
without a valid 4CC. This triggers a WARN_ON.
As a format with a zero 4CC can't be selected, it is unusable for
applications. Ignore the format completely without creating a uvc_format
instance, which fixes the warning.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217252
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2180107
Fixes: 50459f103edf ("media: uvcvideo: Remove format descriptions")
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Linus Torvalds [Fri, 2 Jun 2023 17:47:36 +0000 (13:47 -0400)]
Merge tag 'riscv-for-linus-6.4-rc5' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A build warning fix for BUILTIN_DTB=y
- Hibernation support is hidden behind NONPORTABLE, as it depends on
some undocumented early boot behavior and breaks on most platforms
- A fix for relocatable kernels on systems with early boot errata
- A fix to properly handle perf callchains for kernel tracepoints
- A pair of fixes for NAPOT to avoid inconsistencies between PTEs and
handle hardware that sets arbitrary A/D bits
* tag 'riscv-for-linus-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Implement missing huge_ptep_get
riscv: Fix huge_ptep_set_wrprotect when PTE is a NAPOT
riscv: perf: Fix callchain parse error with kernel tracepoint events
riscv: Fix relocatable kernels with early alternatives using -fno-pie
RISC-V: mark hibernation as nonportable
riscv: Fix unused variable warning when BUILTIN_DTB is set
Tomi Valkeinen [Mon, 22 May 2023 10:52:45 +0000 (11:52 +0100)]
media: v4l2-subdev: Fix missing kerneldoc for client_caps
Add missing kernel doc for the new 'client_caps' field in struct
v4l2_subdev_fh.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Fixes: f57fa2959244 ("media: v4l2-subdev: Add new ioctl for client capabilities")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Hans Verkuil [Tue, 18 Apr 2023 07:46:52 +0000 (08:46 +0100)]
media: staging: media: imx: initialize hs_settle to avoid warning
Initialize hs_settle to 0 to avoid this compiler warning:
imx8mq-mipi-csi2.c: In function 'imx8mq_mipi_csi_start_stream.part.0':
imx8mq-mipi-csi2.c:91:55: warning: 'hs_settle' may be used uninitialized [-Wmaybe-uninitialized]
91 | #define GPR_CSI2_1_S_PRG_RXHS_SETTLE(x) (((x) & 0x3f) << 2)
| ^~
imx8mq-mipi-csi2.c:357:13: note: 'hs_settle' was declared here
357 | u32 hs_settle;
| ^~~~~~~~~
It's a false positive, but it is too complicated for the compiler to detect that.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Martin Kepplinger <martink@posteo.de>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Vaishnav Achath [Fri, 21 Apr 2023 10:04:30 +0000 (11:04 +0100)]
media: v4l2-mc: Drop subdev check in v4l2_create_fwnode_links_to_pad()
While updating v4l2_create_fwnode_links_to_pad() to accept non-subdev
sinks, the check is_media_entity_v4l2_subdev() was not removed which
prevented the function from being used with non-subdev sinks, Drop the
unnecessary check.
Fixes: bd5a03bc5be8 ("media: Accept non-subdev sinks in v4l2_create_fwnode_links_to_pad()")
Signed-off-by: Vaishnav Achath <vaishnav.a@ti.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Linus Torvalds [Fri, 2 Jun 2023 17:38:55 +0000 (13:38 -0400)]
Merge tag 'nfsd-6.4-2' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two minor bug fixes
* tag 'nfsd-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix double fget() bug in __write_ports_addfd()
nfsd: make a copy of struct iattr before calling notify_change
Linus Torvalds [Fri, 2 Jun 2023 17:13:50 +0000 (13:13 -0400)]
Merge tag 'block-6.4-2023-06-02' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Just an NVMe pull request with (mostly) KATO fixes, a regression fix
for zoned device revalidation, and a fix for an md raid5 regression"
* tag 'block-6.4-2023-06-02' of git://git.kernel.dk/linux:
nvme: fix the name of Zone Append for verbose logging
nvme: improve handling of long keep alives
nvme: check IO start time when deciding to defer KA
nvme: double KA polling frequency to avoid KATO with TBKAS on
nvme: fix miss command type check
block: fix revalidate performance regression
md/raid5: fix miscalculation of 'end_sector' in raid5_read_one_chunk()
Linus Torvalds [Fri, 2 Jun 2023 17:08:27 +0000 (13:08 -0400)]
Merge tag 'io_uring-6.4-2023-06-02' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
"Just a single revert in here, removing the warning on the epoll ctl
opcode.
We originally deprecated this a few releases ago, but I've since had
two people report that it's being used. Which isn't the biggest deal,
obviously this is why we out in the deprecation notice in the first
place, but it also means that we should just kill this warning again
and abandon the deprecation plans.
Since it's only a few handfuls of code to support epoll ctl, not worth
going any further with this imho"
* tag 'io_uring-6.4-2023-06-02' of git://git.kernel.dk/linux:
io_uring: undeprecate epoll_ctl support
Linus Torvalds [Fri, 2 Jun 2023 12:35:13 +0000 (08:35 -0400)]
Merge tag 'mmc-v6.4-rc1-2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix pwrseq for WILC1000/WILC3000 SDIO card
MMC host:
- vub300: Fix invalid response handling"
* tag 'mmc-v6.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: pwrseq: sd8787: Fix WILC CHIP_EN and RESETN toggling order
mmc: vub300: fix invalid response handling
Linus Torvalds [Fri, 2 Jun 2023 12:21:18 +0000 (08:21 -0400)]
Merge tag 'iommu-fixes-v6.4-rc4' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"AMD IOMMU fixes:
- Fix domain type and size checks
- IOTLB flush fix for invalidating ranges
- Guest IRQ handling fixes and GALOG overflow fix
Rockchip IOMMU:
- Error handling fix
Mediatek IOMMU:
- IOTLB flushing fix
Renesas IOMMU:
- Fix Kconfig dependencies to avoid build errors on RiscV"
* tag 'iommu-fixes-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/mediatek: Flush IOTLB completely only if domain has been attached
iommu/amd/pgtbl_v2: Fix domain max address
iommu/amd: Fix domain flush size when syncing iotlb
iommu/amd: Add missing domain type checks
iommu/amd: Fix up merge conflict resolution
iommu/amd: Handle GALog overflows
iommu/amd: Don't block updates to GATag if guest mode is on
iommu/rockchip: Fix unwind goto issue
iommu: Make IPMMU_VMSA dependencies more strict
Linus Torvalds [Fri, 2 Jun 2023 11:42:22 +0000 (07:42 -0400)]
Merge tag 'drm-fixes-2023-06-02' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Quiet enough week, though the misc fixes tree didn't get to me when I
was sending this, so maybe it'll be a bit bigger next week, just one
i915 fix and some scattered amdgpu fixes:
amdgpu:
- Fix mclk and fclk output ordering on some APUs
- Fix display regression with 5K VRR
- VCN, JPEG spurious interrupt warning fixes
- Fix SI DPM on some ARM64 platforms
- Fix missing TMZ enablement on GC 11.0.1
i915:
- Fix for OA reporting to allow detecting non-power-of-two reports"
* tag 'drm-fixes-2023-06-02' of git://anongit.freedesktop.org/drm/drm:
drm/i915/perf: Clear out entire reports after reading if not power of 2 size
drm/amdgpu: enable tmz by default for GC 11.0.1
drm/amd/pm: resolve reboot exception for si oland
drm/amdgpu: add RAS POISON interrupt funcs for jpeg_v4_0
drm/amdgpu: add RAS POISON interrupt funcs for jpeg_v2_6
drm/amdgpu: separate ras irq from jpeg instance irq for UVD_POISON
drm/amdgpu: add RAS POISON interrupt funcs for vcn_v4_0
drm/amdgpu: add RAS POISON interrupt funcs for vcn_v2_6
drm/amdgpu: separate ras irq from vcn instance irq for UVD_POISON
Revert "drm/amd/display: Do not set drr on pipe commit"
Revert "drm/amd/display: Block optimize on consecutive FAMS enables"
drm/amd/pm: reverse mclk and fclk clocks levels for renoir
drm/amd/pm: reverse mclk and fclk clocks levels for vangogh
drm/amd/pm: reverse mclk and fclk clocks levels for yellow carp
drm/amd/pm: reverse mclk clocks levels for SMU v13.0.5
drm/amd/pm: reverse mclk and fclk clocks levels for SMU v13.0.4
Linus Torvalds [Fri, 2 Jun 2023 11:30:27 +0000 (07:30 -0400)]
Merge tag 'selinux-pr-
20230601' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A small SELinux Makefile fix to resolve a problem seen when building
the kernel with older versions of make.
The fix is pretty trivial and effectively reverts a patch that was
merged during the last merge window"
* tag 'selinux-pr-
20230601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: don't use make's grouped targets feature yet
Alexandre Ghiti [Fri, 28 Apr 2023 12:01:20 +0000 (14:01 +0200)]
riscv: Implement missing huge_ptep_get
huge_ptep_get must be reimplemented in order to go through all the PTEs
of a NAPOT region: this is needed because the HW can update the A/D bits
of any of the PTE that constitutes the NAPOT region.
Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230428120120.21620-2-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 28 Apr 2023 12:01:19 +0000 (14:01 +0200)]
riscv: Fix huge_ptep_set_wrprotect when PTE is a NAPOT
We need to avoid inconsistencies across the PTEs that form a NAPOT
region, so when we write protect such a region, we should clear and flush
all the PTEs to make sure that any of those PTEs is not cached which would
result in such inconsistencies (arm64 does the same).
Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230428120120.21620-1-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Linus Torvalds [Fri, 2 Jun 2023 00:48:16 +0000 (20:48 -0400)]
Merge tag 'modules-6.4-rc5-second-pull' of git://git./linux/kernel/git/mcgrof/linux
Pull modules fix from Luis Chamberlain:
"A zstd fix by lucas as he tested zstd decompression support"
* tag 'modules-6.4-rc5-second-pull' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module/decompress: Fix error checking on zstd decompression
Linus Torvalds [Fri, 2 Jun 2023 00:43:11 +0000 (20:43 -0400)]
Merge tag 'efi-fixes-for-v6.4-1' of git://git./linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"A few minor fixes for EFI, one of which fixes the reported boot
regression when booting x86 kernels using the BIOS based loader built
into the hypervisor framework on macOS.
- fix harmless warning in zboot code on 'make clean'
- add some missing prototypes
- fix boot regressions triggered by PE/COFF header image minor
version bump"
* tag 'efi-fixes-for-v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: Bump stub image version for macOS HVF compatibility
efi: fix missing prototype warnings
efi/libstub: zboot: Avoid eager evaluation of objcopy flags
Dave Airlie [Fri, 2 Jun 2023 00:33:29 +0000 (10:33 +1000)]
Merge tag 'drm-intel-fixes-2023-06-01' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix for OA reporting to allow detecting non-power-of-two reports
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZHimf55x/DyXYar1@jlahtine-mobl.ger.corp.intel.com
Dave Airlie [Thu, 1 Jun 2023 23:52:47 +0000 (09:52 +1000)]
Merge tag 'amd-drm-fixes-6.4-2023-05-31' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.4-2023-05-31:
amdgpu:
- Fix mclk and fclk output ordering on some APUs
- Fix display regression with 5K VRR
- VCN, JPEG spurious interrupt warning fixes
- Fix SI DPM on some ARM64 platforms
- Fix missing TMZ enablement on GC 11.0.1
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230601033846.7628-1-alexander.deucher@amd.com
Linus Torvalds [Thu, 1 Jun 2023 21:50:22 +0000 (17:50 -0400)]
Merge tag 'fbdev-for-6.4-rc5' of git://git./linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
"Most notable is a fix for a null-ptr-deref in fbcon's soft_cursor
function which was found by syzbot.
- Fix null-ptr-deref in soft_cursor
- various remove callback conversions
- error path fixes in imsttfb"
* tag 'fbdev-for-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: bw2: Convert to platform remove callback returning void
fbdev: broadsheetfb: Convert to platform remove callback returning void
fbdev: au1200fb: Convert to platform remove callback returning void
fbdev: au1100fb: Convert to platform remove callback returning void
fbdev: arcfb: Convert to platform remove callback returning void
fbdev: au1100fb: Drop if with an always false condition
fbcon: Fix null-ptr-deref in soft_cursor
fbdev: imsttfb: Fix error path of imsttfb_probe()
fbdev: imsttfb: Release framebuffer and dealloc cmap on error path
fbdev: matroxfb ssd1307fb: Switch i2c drivers back to use .probe()
Lucas De Marchi [Thu, 1 Jun 2023 21:23:31 +0000 (14:23 -0700)]
module/decompress: Fix error checking on zstd decompression
While implementing support for in-kernel decompression in kmod,
finit_module() was returning a very suspicious value:
finit_module(3, "", MODULE_INIT_COMPRESSED_FILE) =
18446744072717407296
It turns out the check for module_get_next_page() failing is wrong,
and hence the decompression was not really taking place. Invert
the condition to fix it.
Fixes: 169a58ad824d ("module/decompress: Support zstd in-kernel decompression")
Cc: stable@kernel.org
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Linus Torvalds [Thu, 1 Jun 2023 21:35:17 +0000 (17:35 -0400)]
Merge tag 'mtd/fixes-for-6.4-rc5' of git://git./linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"MTD core:
- MAINTAINERS: Add Michal as reviewer instead of Naga
- mtdchar: Mark bits of ioctl handler noinline
NAND controller drivers:
- marvell:
- Don't set the NAND frequency select
- Ensure timing values are written
- ingenic: Fix empty stub helper definitions
SPI-NOR core:
- Fix divide by zero for spi-nor-generic flashes
SPI-NOR manufacturer driver:
- spansion: make sure local struct does not contain garbage"
* tag 'mtd/fixes-for-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: marvell: don't set the NAND frequency select
mtd: rawnand: marvell: ensure timing values are written
mtdchar: mark bits of ioctl handler noinline
MAINTAINERS: Add myself as reviewer instead of Naga
mtd: spi-nor: Fix divide by zero for spi-nor-generic flashes
mtd: rawnand: ingenic: fix empty stub helper definitions
mtd: spi-nor: spansion: make sure local struct does not contain garbage
Linus Torvalds [Thu, 1 Jun 2023 21:29:18 +0000 (17:29 -0400)]
Merge tag 'net-6.4-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Happy Wear a Dress Day.
Fairly standard-sized batch of fixes, accounting for the lack of
sub-tree submissions this week. The mlx5 IRQ fixes are notable, people
were complaining about that. No fires burning.
Current release - regressions:
- eth: mlx5e:
- multiple fixes for dynamic IRQ allocation
- prevent encap offload when neigh update is running
- eth: mana: fix perf regression: remove rx_cqes, tx_cqes counters
Current release - new code bugs:
- eth: mlx5e: DR, add missing mutex init/destroy in pattern manager
Previous releases - always broken:
- tcp: deny tcp_disconnect() when threads are waiting
- sched: prevent ingress Qdiscs from getting installed in random
locations in the hierarchy and moving around
- sched: flower: fix possible OOB write in fl_set_geneve_opt()
- netlink: fix NETLINK_LIST_MEMBERSHIPS length report
- udp6: fix race condition in udp6_sendmsg & connect
- tcp: fix mishandling when the sack compression is deferred
- rtnetlink: validate link attributes set at creation time
- mptcp: fix connect timeout handling
- eth: stmmac: fix call trace when stmmac_xdp_xmit() is invoked
- eth: amd-xgbe: fix the false linkup in xgbe_phy_status
- eth: mlx5e:
- fix corner cases in internal buffer configuration
- drain health before unregistering devlink
- usb: qmi_wwan: set DTR quirk for BroadMobi BM818
Misc:
- tcp: return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if
user_mss set"
* tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
mptcp: fix active subflow finalization
mptcp: add annotations around sk->sk_shutdown accesses
mptcp: fix data race around msk->first access
mptcp: consolidate passive msk socket initialization
mptcp: add annotations around msk->subflow accesses
mptcp: fix connect timeout handling
rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg
rtnetlink: call validate_linkmsg in rtnl_create_link
ice: recycle/free all of the fragments from multi-buffer frame
net: phy: mxl-gpy: extend interrupt fix to all impacted variants
net: renesas: rswitch: Fix return value in error path of xmit
net: dsa: mv88e6xxx: Increase wait after reset deactivation
net: ipa: Use correct value for IPA_STATUS_SIZE
tcp: fix mishandling when the sack compression is deferred.
net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
sfc: fix error unwinds in TC offload
net/mlx5: Read embedded cpu after init bit cleared
net/mlx5e: Fix error handling in mlx5e_refresh_tirs
net/mlx5: Ensure af_desc.mask is properly initialized
...
Mike Christie [Thu, 1 Jun 2023 18:32:32 +0000 (13:32 -0500)]
fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process. 2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.
To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be supported.
This is a modified version of the patch written by Mike Christie
<michael.christie@oracle.com> which was a modified version of patch
originally written by Linus.
Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.
Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c. Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn. vhost_worker now returns true if work was done.
The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit. This collection clears
__fatal_signal_pending. This collection is not guaranteed to
clear signal_pending() so clear that explicitly so the schedule()
sleeps.
For now the vhost thread continues to exist and run work until the
last file descriptor is closed and the release function is called as
part of freeing struct file. To avoid hangs in the coredump
rendezvous and when killing threads in a multi-threaded exec. The
coredump code and de_thread have been modified to ignore vhost threads.
Remvoing the special case for exec appears to require teaching
vhost_dev_flush how to directly complete transactions in case
the vhost thread is no longer running.
Removing the special case for coredump rendezvous requires either the
above fix needed for exec or moving the coredump rendezvous into
get_signal.
Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Co-developed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Francesco Dolcini [Wed, 31 May 2023 11:10:38 +0000 (13:10 +0200)]
dt-bindings: serial: 8250_omap: add rs485-rts-active-high
Add rs485-rts-active-high property, this was removed by mistake.
In general we just use rs485-rts-active-low property, however the OMAP
UART for legacy reason uses the -high one.
Fixes: 767d3467eb60 ("dt-bindings: serial: 8250_omap: drop rs485 properties")
Closes: https://lore.kernel.org/all/ZGefR4mTHHo1iQ7H@francesco-nb.int.toradex.com/
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230531111038.6302-1-francesco@dolcini.it
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paul Moore [Thu, 1 Jun 2023 14:21:21 +0000 (10:21 -0400)]
selinux: don't use make's grouped targets feature yet
The Linux Kernel currently only requires make v3.82 while the grouped
target functionality requires make v4.3. Removed the grouped target
introduced in
4ce1f694eb5d ("selinux: ensure av_permissions.h is
built when needed") as well as the multiple header file targets in
the make rule. This effectively reverts the problem commit.
We will revisit this change when make >= 4.3 is required by the rest
of the kernel.
Cc: stable@vger.kernel.org
Fixes: 4ce1f694eb5d ("selinux: ensure av_permissions.h is built when needed")
Reported-by: Erwan Velu <e.velu@criteo.com>
Reported-by: Luiz Capitulino <luizcap@amazon.com>
Tested-by: Luiz Capitulino <luizcap@amazon.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Jakub Kicinski [Thu, 1 Jun 2023 17:15:43 +0000 (10:15 -0700)]
Merge tag 'mlx5-fixes-2023-05-31' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2023-05-31
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Read embedded cpu after init bit cleared
net/mlx5e: Fix error handling in mlx5e_refresh_tirs
net/mlx5: Ensure af_desc.mask is properly initialized
net/mlx5: Fix setting of irq->map.index for static IRQ case
net/mlx5: Remove rmap also in case dynamic MSIX not supported
====================
Link: https://lore.kernel.org/r/20230601031051.131529-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jens Axboe [Thu, 1 Jun 2023 17:12:46 +0000 (11:12 -0600)]
Merge tag 'nvme-6.4-2023-06-01' of git://git.infradead.org/nvme into block-6.4
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.4
- Fixes for spurious Keep Alive timeouts (Uday)
- Fix for command type check on passthrough actions (Min)
- Fix for nvme command name for error logging (Christoph)"
* tag 'nvme-6.4-2023-06-01' of git://git.infradead.org/nvme:
nvme: fix the name of Zone Append for verbose logging
nvme: improve handling of long keep alives
nvme: check IO start time when deciding to defer KA
nvme: double KA polling frequency to avoid KATO with TBKAS on
nvme: fix miss command type check
Ism Hong [Thu, 1 Jun 2023 09:53:55 +0000 (17:53 +0800)]
riscv: perf: Fix callchain parse error with kernel tracepoint events
For RISC-V, when tracing with tracepoint events, the IP and status are
set to 0, preventing the perf code parsing the callchain and resolving
the symbols correctly.
./ply 'tracepoint:kmem/kmem_cache_alloc { @[stack]=count(); }'
@:
{ <STACKID4294967282> }: 1
The fix is to implement perf_arch_fetch_caller_regs for riscv, which
fills several necessary registers used for callchain unwinding,
including epc, sp, s0 and status. It's similar to commit
b3eac0265bf6
("arm: perf: Fix callchain parse error with kernel tracepoint events")
and commit
5b09a094f2fb ("arm64: perf: Fix callchain parse error with
kernel tracepoint events").
With this patch, callchain can be parsed correctly as:
./ply 'tracepoint:kmem/kmem_cache_alloc { @[stack]=count(); }'
@:
{
__traceiter_kmem_cache_alloc+68
__traceiter_kmem_cache_alloc+68
kmem_cache_alloc+354
__sigqueue_alloc+94
__send_signal_locked+646
send_signal_locked+154
do_send_sig_info+84
__kill_pgrp_info+130
kill_pgrp+60
isig+150
n_tty_receive_signal_char+36
n_tty_receive_buf_standard+2214
n_tty_receive_buf_common+280
n_tty_receive_buf2+26
tty_ldisc_receive_buf+34
tty_port_default_receive_buf+62
flush_to_ldisc+158
process_one_work+458
worker_thread+138
kthread+178
riscv_cpufeature_patch_func+832
}: 1
Signed-off-by: Ism Hong <ism.hong@gmail.com>
Link: https://lore.kernel.org/r/20230601095355.1168910-1-ism.hong@gmail.com
Fixes: 178e9fc47aae ("perf: riscv: preliminary RISC-V support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jakub Kicinski [Thu, 1 Jun 2023 17:04:06 +0000 (10:04 -0700)]
Merge branch 'mptcp-fixes-for-connect-timeout-access-annotations-and-subflow-init'
Mat Martineau says:
====================
mptcp: Fixes for connect timeout, access annotations, and subflow init
Patch 1 allows the SO_SNDTIMEO sockopt to correctly change the connect
timeout on MPTCP sockets.
Patches 2-5 add READ_ONCE()/WRITE_ONCE() annotations to fix KCSAN issues.
Patch 6 correctly initializes some subflow fields on outgoing connections.
====================
Link: https://lore.kernel.org/r/20230531-send-net-20230531-v1-0-47750c420571@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 31 May 2023 19:37:08 +0000 (12:37 -0700)]
mptcp: fix active subflow finalization
Active subflow are inserted into the connection list at creation time.
When the MPJ handshake completes successfully, a new subflow creation
netlink event is generated correctly, but the current code wrongly
avoid initializing a couple of subflow data.
The above will cause misbehavior on a few exceptional events: unneeded
mptcp-level retransmission on msk-level sequence wrap-around and infinite
mapping fallback even when a MPJ socket is present.
Address the issue factoring out the needed initialization in a new helper
and invoking the latter from __mptcp_finish_join() time for passive
subflow and from mptcp_finish_join() for active ones.
Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 31 May 2023 19:37:07 +0000 (12:37 -0700)]
mptcp: add annotations around sk->sk_shutdown accesses
Christoph reported the mptcp variant of a recently addressed plain
TCP issue. Similar to commit
e14cadfd80d7 ("tcp: add annotations around
sk->sk_shutdown accesses") add READ/WRITE ONCE annotations to silence
KCSAN reports around lockless sk_shutdown access.
Fixes: 71ba088ce0aa ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/401
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 31 May 2023 19:37:06 +0000 (12:37 -0700)]
mptcp: fix data race around msk->first access
The first subflow socket is accessed outside the msk socket lock
by mptcp_subflow_fail(), we need to annotate each write access
with WRITE_ONCE, but a few spots still lacks it.
Fixes: 76a13b315709 ("mptcp: invoke MP_FAIL response when needed")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 31 May 2023 19:37:05 +0000 (12:37 -0700)]
mptcp: consolidate passive msk socket initialization
When the msk socket is cloned at MPC handshake time, a few
fields are initialized in a racy way outside mptcp_sk_clone()
and the msk socket lock.
The above is due historical reasons: before commit
a88d0092b24b
("mptcp: simplify subflow_syn_recv_sock()") as the first subflow socket
carrying all the needed date was not available yet at msk creation
time
We can now refactor the code moving the missing initialization bit
under the socket lock, removing the init race and avoiding some
code duplication.
This will also simplify the next patch, as all msk->first write
access are now under the msk socket lock.
Fixes: 0397c6d85f9c ("mptcp: keep unaccepted MPC subflow into join list")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 31 May 2023 19:37:04 +0000 (12:37 -0700)]
mptcp: add annotations around msk->subflow accesses
The MPTCP can access the first subflow socket in a few spots
outside the socket lock scope. That is actually safe, as MPTCP
will delete the socket itself only after the msk sock close().
Still the such accesses causes a few KCSAN splats, as reported
by Christoph. Silence the harmless warning adding a few annotation
around the relevant accesses.
Fixes: 71ba088ce0aa ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/402
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 31 May 2023 19:37:03 +0000 (12:37 -0700)]
mptcp: fix connect timeout handling
Ondrej reported a functional issue WRT timeout handling on connect
with a nice reproducer.
The problem is that the current mptcp connect waits for both the
MPTCP socket level timeout, and the first subflow socket timeout.
The latter is not influenced/touched by the exposed setsockopt().
Overall the above makes the SO_SNDTIMEO a no-op on connect.
Since mptcp_connect is invoked via inet_stream_connect and the
latter properly handle the MPTCP level timeout, we can address the
issue making the nested subflow level connect always unblocking.
This also allow simplifying a bit the code, dropping an ugly hack
to handle the fastopen and custom proto_ops connect.
The issues predates the blamed commit below, but the current resolution
requires the infrastructure introduced there.
Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 1 Jun 2023 16:59:45 +0000 (09:59 -0700)]
Merge branch 'rtnetlink-a-couple-of-fixes-in-linkmsg-validation'
Xin Long says:
====================
rtnetlink: a couple of fixes in linkmsg validation
validate_linkmsg() was introduced to do linkmsg validation for existing
links. However, the new created links also need this linkmsg validation.
Add validate_linkmsg() check for link creating in Patch 1, and add more
tb checks into validate_linkmsg() in Patch 2 and 3.
====================
Link: https://lore.kernel.org/r/cover.1685548598.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Wed, 31 May 2023 16:01:44 +0000 (12:01 -0400)]
rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
This fixes the issue that dev gro_max_size and gso_ipv4_max_size
can be set to a huge value:
# ip link add dummy1 type dummy
# ip link set dummy1 gro_max_size
4294967295
# ip -d link show dummy1
dummy addrgenmode eui64 ... gro_max_size
4294967295
Fixes: 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536")
Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Wed, 31 May 2023 16:01:43 +0000 (12:01 -0400)]
rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg
These IFLA_GSO_* tb check should also be done for the new created link,
otherwise, they can be set to a huge value when creating links:
# ip link add dummy1 gso_max_size
4294967295 type dummy
# ip -d link show dummy1
dummy addrgenmode eui64 ... gso_max_size
4294967295
Fixes: 46e6b992c250 ("rtnetlink: allow GSO maximums to be set on device creation")
Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Wed, 31 May 2023 16:01:42 +0000 (12:01 -0400)]
rtnetlink: call validate_linkmsg in rtnl_create_link
validate_linkmsg() was introduced by commit
1840bb13c22f5b ("[RTNL]:
Validate hardware and broadcast address attribute for RTM_NEWLINK")
to validate tb[IFLA_ADDRESS/BROADCAST] for existing links. The same
check should also be done for newly created links.
This patch adds validate_linkmsg() call in rtnl_create_link(), to
avoid the invalid address set when creating some devices like:
# ip link add dummy0 type dummy
# ip link add link dummy0 name mac0 address 01:02 type macsec
Fixes: 0e06877c6fdb ("[RTNETLINK]: rtnl_link: allow specifying initial device address")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maciej Fijalkowski [Wed, 31 May 2023 15:44:57 +0000 (08:44 -0700)]
ice: recycle/free all of the fragments from multi-buffer frame
The ice driver caches next_to_clean value at the beginning of
ice_clean_rx_irq() in order to remember the first buffer that has to be
freed/recycled after main Rx processing loop. The end boundary is
indicated by first descriptor of frame that Rx processing loop has ended
its duties. Note that if mentioned loop ended in the middle of gathering
multi-buffer frame, next_to_clean would be pointing to the descriptor in
the middle of the frame BUT freeing/recycling stage will stop at the
first descriptor. This means that next iteration of ice_clean_rx_irq()
will miss the (first_desc, next_to_clean - 1) entries.
When running various 9K MTU workloads, such splats were observed:
[ 540.780716] BUG: kernel NULL pointer dereference, address:
0000000000000000
[ 540.787787] #PF: supervisor read access in kernel mode
[ 540.793002] #PF: error_code(0x0000) - not-present page
[ 540.798218] PGD 0 P4D 0
[ 540.800801] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 540.805231] CPU: 18 PID: 3984 Comm: xskxceiver Tainted: G W 6.3.0-rc7+ #96
[ 540.813619] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.
031920191559 03/19/2019
[ 540.824209] RIP: 0010:ice_clean_rx_irq+0x2b6/0xf00 [ice]
[ 540.829678] Code: 74 24 10 e9 aa 00 00 00 8b 55 78 41 31 57 10 41 09 c4 4d 85 ff 0f 84 83 00 00 00 49 8b 57 08 41 8b 4f 1c 65 8b 35 1a fa 4b 3f <48> 8b 02 48 c1 e8 3a 39 c6 0f 85 a2 00 00 00 f6 42 08 02 0f 85 98
[ 540.848717] RSP: 0018:
ffffc9000f42fc50 EFLAGS:
00010282
[ 540.854029] RAX:
0000000000000004 RBX:
0000000000000002 RCX:
000000000000fffe
[ 540.861272] RDX:
0000000000000000 RSI:
0000000000000001 RDI:
00000000ffffffff
[ 540.868519] RBP:
ffff88984a05ac00 R08:
0000000000000000 R09:
dead000000000100
[ 540.875760] R10:
ffff88983fffcd00 R11:
000000000010f2b8 R12:
0000000000000004
[ 540.883008] R13:
0000000000000003 R14:
0000000000000800 R15:
ffff889847a10040
[ 540.890253] FS:
00007f6ddf7fe640(0000) GS:
ffff88afdf800000(0000) knlGS:
0000000000000000
[ 540.898465] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 540.904299] CR2:
0000000000000000 CR3:
000000010d3da001 CR4:
00000000007706e0
[ 540.911542] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 540.918789] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 540.926032] PKRU:
55555554
[ 540.928790] Call Trace:
[ 540.931276] <TASK>
[ 540.933418] ice_napi_poll+0x4ca/0x6d0 [ice]
[ 540.937804] ? __pfx_ice_napi_poll+0x10/0x10 [ice]
[ 540.942716] napi_busy_loop+0xd7/0x320
[ 540.946537] xsk_recvmsg+0x143/0x170
[ 540.950178] sock_recvmsg+0x99/0xa0
[ 540.953729] __sys_recvfrom+0xa8/0x120
[ 540.957543] ? do_futex+0xbd/0x1d0
[ 540.961008] ? __x64_sys_futex+0x73/0x1d0
[ 540.965083] __x64_sys_recvfrom+0x20/0x30
[ 540.969155] do_syscall_64+0x38/0x90
[ 540.972796] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 540.977934] RIP: 0033:0x7f6de5f27934
To fix this, set cached_ntc to first_desc so that at the end, when
freeing/recycling buffers, descriptors from first to ntc are not missed.
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230531154457.3216621-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xu Liang [Wed, 31 May 2023 07:48:22 +0000 (15:48 +0800)]
net: phy: mxl-gpy: extend interrupt fix to all impacted variants
The interrupt fix in commit
97a89ed101bb should be applied on all variants
of GPY2xx PHY and GPY115C.
Fixes: 97a89ed101bb ("net: phy: mxl-gpy: disable interrupts on GPY215 by default")
Signed-off-by: Xu Liang <lxu@maxlinear.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531074822.39136-1-lxu@maxlinear.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yoshihiro Shimoda [Mon, 29 May 2023 07:38:17 +0000 (16:38 +0900)]
net: renesas: rswitch: Fix return value in error path of xmit
Fix return value in the error path of rswitch_start_xmit(). If TX
queues are full, this function should return NETDEV_TX_BUSY.
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20230529073817.1145208-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chris Packham [Thu, 25 May 2023 00:31:53 +0000 (12:31 +1200)]
mtd: rawnand: marvell: don't set the NAND frequency select
marvell_nfc_setup_interface() uses the frequency retrieved from the
clock associated with the nand interface to determine the timings that
will be used. By changing the NAND frequency select without reflecting
this in the clock configuration this means that the timings calculated
don't correctly meet the requirements of the NAND chip. This hasn't been
an issue up to now because of a different bug that was stopping the
timings being updated after they were initially set.
Fixes: b25251414f6e ("mtd: rawnand: marvell: Stop implementing ->select_chip()")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230525003154.2303012-2-chris.packham@alliedtelesis.co.nz
Chris Packham [Thu, 25 May 2023 00:31:52 +0000 (12:31 +1200)]
mtd: rawnand: marvell: ensure timing values are written
When new timing values are calculated in marvell_nfc_setup_interface()
ensure that they will be applied in marvell_nfc_select_target() by
clearing the selected_chip pointer.
Fixes: b25251414f6e ("mtd: rawnand: marvell: Stop implementing ->select_chip()")
Suggested-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230525003154.2303012-1-chris.packham@alliedtelesis.co.nz
Arnd Bergmann [Mon, 17 Apr 2023 20:56:50 +0000 (22:56 +0200)]
mtdchar: mark bits of ioctl handler noinline
The addition of the mtdchar_read_ioctl() function caused the stack usage
of mtdchar_ioctl() to grow beyond the warning limit on 32-bit architectures
with gcc-13:
drivers/mtd/mtdchar.c: In function 'mtdchar_ioctl':
drivers/mtd/mtdchar.c:1229:1: error: the frame size of 1488 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Mark both the read and write portions as noinline_for_stack to ensure
they don't get inlined and use separate stack slots to reduce the
maximum usage, both in the mtdchar_ioctl() and combined with any
of its callees.
Fixes: 095bb6e44eb1 ("mtdchar: add MEMREAD ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230417205654.1982368-1-arnd@kernel.org
Michal Simek [Fri, 26 May 2023 17:41:54 +0000 (19:41 +0200)]
MAINTAINERS: Add myself as reviewer instead of Naga
Naga no longer works for AMD/Xilinx and there is no activity from him to
continue to maintain Xilinx related drivers. Add myself instead to be kept
in loop if there is any need for testing.
Signed-off-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
[<miquel.raynal@bootlin.com>: Manually apply on top of the latest -rc which
where the MAINTAINERS file got sorted]
Link: https://lore.kernel.org/linux-mtd/06df49c300c53a27423260e99acc217b06d4e588.1684827820.git.michal.simek@amd.com
Linus Torvalds [Thu, 1 Jun 2023 15:18:20 +0000 (11:18 -0400)]
Merge tag 'firewire-fixes-6.4-rc4' of git://git./linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
"A single patch to use a flexible array rather than a zero-length one"
* tag 'firewire-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: Replace zero-length array with flexible-array member
Linus Torvalds [Thu, 1 Jun 2023 15:13:10 +0000 (11:13 -0400)]
Merge tag 'mailbox-fixes-6.4-rc5' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox fix from Jassi Brar:
"Fix missing mutex unlock in mailbox-test"
* tag 'mailbox-fixes-6.4-rc5' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: mailbox-test: fix a locking issue in mbox_test_message_write()
Andreas Svensson [Tue, 30 May 2023 14:52:23 +0000 (16:52 +0200)]
net: dsa: mv88e6xxx: Increase wait after reset deactivation
A switch held in reset by default needs to wait longer until we can
reliably detect it.
An issue was observed when testing on the Marvell 88E6393X (Link Street).
The driver failed to detect the switch on some upstarts. Increasing the
wait time after reset deactivation solves this issue.
The updated wait time is now also the same as the wait time in the
mv88e6xxx_hardware_reset function.
Fixes: 7b75e49de424 ("net: dsa: mv88e6xxx: wait after reset deactivation")
Signed-off-by: Andreas Svensson <andreas.svensson@axis.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230530145223.1223993-1-andreas.svensson@axis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Gustavo A. R. Silva [Mon, 29 May 2023 18:52:07 +0000 (12:52 -0600)]
firewire: Replace zero-length array with flexible-array member
Zero-length and one-element arrays are deprecated, and we are moving
towards adopting C99 flexible-array members, instead.
Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
sound/firewire/amdtp-stream.c: In function ‘build_it_pkt_header’:
sound/firewire/amdtp-stream.c:694:17: warning: ‘generate_cip_header’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
694 | generate_cip_header(s, cip_header, data_block_counter, syt);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/firewire/amdtp-stream.c:694:17: note: referencing argument 2 of type ‘__be32[2]’ {aka ‘unsigned int[2]’}
sound/firewire/amdtp-stream.c:667:13: note: in a call to function ‘generate_cip_header’
667 | static void generate_cip_header(struct amdtp_stream *s, __be32 cip_header[2],
| ^~~~~~~~~~~~~~~~~~~
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/303
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/ZHT0V3SpvHyxCv5W@work
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Qu Wenruo [Thu, 1 Jun 2023 10:51:34 +0000 (18:51 +0800)]
btrfs: zoned: fix dev-replace after the scrub rework
[BUG]
After commit
e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror()
to scrub_stripe infrastructure"), scrub no longer works for zoned device
at all.
Even an empty zoned btrfs cannot be replaced:
# mkfs.btrfs -f /dev/nvme0n1
# mount /dev/nvme0n1 /mnt/btrfs
# btrfs replace start -Bf 1 /dev/nvme0n2 /mnt/btrfs
Resetting device zones /dev/nvme1n1 (160 zones) ...
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt/btrfs/": Input/output error
And we can hit kernel crash related to that:
BTRFS info (device nvme1n1): host-managed zoned block device /dev/nvme3n1, 160 zones of
134217728 bytes
BTRFS info (device nvme1n1): dev_replace from /dev/nvme2n1 (devid 2) to /dev/nvme3n1 started
nvme3n1: Zone Management Append(0x7d) @ LBA 65536, 4 blocks, Zone Is Full (sct 0x1 / sc 0xb9) DNR
I/O error, dev nvme3n1, sector 786432 op 0xd:(ZONE_APPEND) flags 0x4000 phys_seg 3 prio class 2
BTRFS error (device nvme1n1): bdev /dev/nvme3n1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
BUG: kernel NULL pointer dereference, address:
00000000000000a8
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:_raw_spin_lock_irqsave+0x1e/0x40
Call Trace:
<IRQ>
btrfs_lookup_ordered_extent+0x31/0x190
btrfs_record_physical_zoned+0x18/0x40
btrfs_simple_end_io+0xaf/0xc0
blk_update_request+0x153/0x4c0
blk_mq_end_request+0x15/0xd0
nvme_poll_cq+0x1d3/0x360
nvme_irq+0x39/0x80
__handle_irq_event_percpu+0x3b/0x190
handle_irq_event+0x2f/0x70
handle_edge_irq+0x7c/0x210
__common_interrupt+0x34/0xa0
common_interrupt+0x7d/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
[CAUSE]
Dev-replace reuses scrub code to iterate all extents and write the
existing content back to the new device.
And for zoned devices, we call fill_writer_pointer_gap() to make sure
all the writes into the zoned device is sequential, even if there may be
some gaps between the writes.
However we have several different bugs all related to zoned dev-replace:
- We are using ZONE_APPEND operation for metadata style write back
For zoned devices, btrfs has two ways to write data:
* ZONE_APPEND for data
This allows higher queue depth, but will not be able to know where
the write would land.
Thus needs to grab the real on-disk physical location in it's endio.
* WRITE for metadata
This requires single queue depth (new writes can only be submitted
after previous one finished), and all writes must be sequential.
For scrub, we go single queue depth, but still goes with ZONE_APPEND,
which requires btrfs_bio::inode being populated.
This is the cause of that crash.
- No correct tracing of write_pointer
After a write finished, we should forward sctx->write_pointer, or
fill_writer_pointer_gap() would not work properly and cause more
than necessary zero out, and fill the whole zone prematurely.
- Incorrect physical bytenr passed to fill_writer_pointer_gap()
In scrub_write_sectors(), one call site passes logical address, which
is completely wrong.
The other call site passes physical address of current sector, but
we should pass the physical address of the btrfs_bio we're submitting.
This is the cause of the -EIO errors.
[FIX]
- Do not use ZONE_APPEND for btrfs_submit_repair_write().
- Manually forward sctx->write_pointer after successful writeback
- Use the physical address of the to-be-submitted btrfs_bio for
fill_writer_pointer_gap()
Now zoned device replace would work as expected.
Reported-by: Christoph Hellwig <hch@lst.de>
Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Thu, 1 Jun 2023 13:02:04 +0000 (09:02 -0400)]
Merge tag 'for-linus-
2023060101' of git://git./linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- Regression fix for overlong long timeouts during initialization on
some Logitech Unifying devices (Bastien Nocera)
- error handling and overflow fixes for Wacom driver (Denis Arefev,
Jason Gerecke, Nikita Zhandarovich)
* tag 'for-linus-
2023060101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: logitech-hidpp: Handle timeout differently from busy
HID: wacom: Add error check to wacom_parse_and_register()
HID: google: add jewel USB id
HID: wacom: avoid integer overflow in wacom_intuos_inout()
HID: wacom: Check for string overflow from strscpy calls
Linus Torvalds [Thu, 1 Jun 2023 12:41:33 +0000 (08:41 -0400)]
Merge tag 'ata-6.4-rc5' of git://git./linux/kernel/git/dlemoal/libata
Pull ata fix from Damien Le Moal:
- Fix ata_find_dev() use of the device number to find a struct
ata_device for a port. This addresses issues with some passthrough
commands with libsas managed devices.
* tag 'ata-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-scsi: Use correct device no in ata_find_dev()
Linus Torvalds [Thu, 1 Jun 2023 12:27:34 +0000 (08:27 -0400)]
Merge tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Eight server fixes (most also for stable):
- Two fixes for uninitialized pointer reads (rename and link)
- Fix potential UAF in oplock break
- Two fixes for potential out of bound reads in negotiate
- Fix crediting bug
- Two fixes for xfstests (allocation size fix for test 694 and lookup
issue shown by test 464)"
* tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: call putname after using the last component
ksmbd: fix incorrect AllocationSize set in smb2_get_info
ksmbd: fix UAF issue from opinfo->conn
ksmbd: fix multiple out-of-bounds read during context decoding
ksmbd: fix slab-out-of-bounds read in smb2_handle_negotiate
ksmbd: fix credit count leakage
ksmbd: fix uninitialized pointer read in smb2_create_link()
ksmbd: fix uninitialized pointer read in ksmbd_vfs_rename()
Bert Karwatzki [Wed, 31 May 2023 10:36:19 +0000 (12:36 +0200)]
net: ipa: Use correct value for IPA_STATUS_SIZE
IPA_STATUS_SIZE was introduced in commit
b8dc7d0eea5a as a replacement
for the size of the removed struct ipa_status which had size
sizeof(__le32[8]). Use this value as IPA_STATUS_SIZE.
Fixes: b8dc7d0eea5a ("net: ipa: stop using sizeof(status)")
Signed-off-by: Bert Karwatzki <spasswolf@web.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531103618.102608-1-spasswolf@web.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
fuyuanli [Wed, 31 May 2023 08:01:50 +0000 (16:01 +0800)]
tcp: fix mishandling when the sack compression is deferred.
In this patch, we mainly try to handle sending a compressed ack
correctly if it's deferred.
Here are more details in the old logic:
When sack compression is triggered in the tcp_compressed_ack_kick(),
if the sock is owned by user, it will set TCP_DELACK_TIMER_DEFERRED
and then defer to the release cb phrase. Later once user releases
the sock, tcp_delack_timer_handler() should send a ack as expected,
which, however, cannot happen due to lack of ICSK_ACK_TIMER flag.
Therefore, the receiver would not sent an ack until the sender's
retransmission timeout. It definitely increases unnecessary latency.
Fixes: 5d9f4262b7ea ("tcp: add SACK compression")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/netdev/20230529113804.GA20300@didi-ThinkCentre-M920t-N000/
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230531080150.GA20424@didi-ThinkCentre-M920t-N000
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangyu Hua [Wed, 31 May 2023 10:28:04 +0000 (18:28 +0800)]
net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
If we send two TCA_FLOWER_KEY_ENC_OPTS_GENEVE packets and their total
size is 252 bytes(key->enc_opts.len = 252) then
key->enc_opts.len = opt->length = data_len / 4 = 0 when the third
TCA_FLOWER_KEY_ENC_OPTS_GENEVE packet enters fl_set_geneve_opt. This
bypasses the next bounds check and results in an out-of-bounds.
Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Link: https://lore.kernel.org/r/20230531102805.27090-1-hbh25y@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Chen-Yu Tsai [Fri, 26 May 2023 08:53:59 +0000 (16:53 +0800)]
iommu/mediatek: Flush IOTLB completely only if domain has been attached
If an IOMMU domain was never attached, it lacks any linkage to the
actual IOMMU hardware. Attempting to do flush_iotlb_all() on it will
result in a NULL pointer dereference. This seems to happen after the
recent IOMMU core rework in v6.4-rc1.
Unable to handle kernel read from unreadable memory at virtual address
0000000000000018
Call trace:
mtk_iommu_flush_iotlb_all+0x20/0x80
iommu_create_device_direct_mappings.part.0+0x13c/0x230
iommu_setup_default_domain+0x29c/0x4d0
iommu_probe_device+0x12c/0x190
of_iommu_configure+0x140/0x208
of_dma_configure_id+0x19c/0x3c0
platform_dma_configure+0x38/0x88
really_probe+0x78/0x2c0
Check if the "bank" field has been filled in before actually attempting
the IOTLB flush to avoid it. The IOTLB is also flushed when the device
comes out of runtime suspend, so it should have a clean initial state.
Fixes: 08500c43d4f7 ("iommu/mediatek: Adjust the structure")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230526085402.394239-1-wenst@chromium.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>