Ken Lin [Wed, 26 Apr 2023 03:22:57 +0000 (03:22 +0000)]
tracing: Add missing spaces in trace_print_hex_seq()
If the buffer length is larger than 16 and concatenate is set to false,
there would be missing spaces every 16 bytes.
Example:
Before: c5 11 10 50 05 4d 31 40 00 40 00 40 00 4d 31 4000 40 00
After: c5 11 10 50 05 4d 31 40 00 40 00 40 00 4d 31 40 00 40 00
Link: https://lore.kernel.org/linux-trace-kernel/20230426032257.3157247-1-lyenting@google.com
Signed-off-by: Ken Lin <lyenting@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tze-nan Wu [Wed, 26 Apr 2023 06:20:23 +0000 (14:20 +0800)]
ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus
In ring_buffer_reset_online_cpus, the buffer_size_kb write operation
may permanently fail if the cpu_online_mask changes between two
for_each_online_buffer_cpu loops. The number of increases and decreases
on both cpu_buffer->resize_disabled and cpu_buffer->record_disabled may be
inconsistent, causing some CPUs to have non-zero values for these atomic
variables after the function returns.
This issue can be reproduced by "echo 0 > trace" while hotplugging cpu.
After reproducing success, we can find out buffer_size_kb will not be
functional anymore.
To prevent leaving 'resize_disabled' and 'record_disabled' non-zero after
ring_buffer_reset_online_cpus returns, we ensure that each atomic variable
has been set up before atomic_sub() to it.
Link: https://lore.kernel.org/linux-trace-kernel/20230426062027.17451-1-Tze-nan.Wu@mediatek.com
Cc: stable@vger.kernel.org
Cc: <mhiramat@kernel.org>
Cc: npiggin@gmail.com
Fixes: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")
Reviewed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Hao Zeng [Wed, 26 Apr 2023 01:05:27 +0000 (09:05 +0800)]
recordmcount: Fix memory leaks in the uwrite function
Common realloc mistake: 'file_append' nulled but not freed upon failure
Link: https://lkml.kernel.org/r/20230426010527.703093-1-zenghao@kylinos.cn
Signed-off-by: Hao Zeng <zenghao@kylinos.cn>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 25 Apr 2023 22:51:07 +0000 (15:51 -0700)]
tracing/user_events: Limit max fault-in attempts
When event enablement changes, user_events attempts to update a bit in
the user process. If a fault is hit, an attempt to fault-in the page and
the write is retried if the page made it in. While this normally requires
a couple attempts, it is possible a bad user process could attempt to
cause infinite loops.
Ensure fault-in attempts either sync or async are limited to a max of 10
attempts for each update. When the max is hit, return -EFAULT so another
attempt is not made in all cases.
Link: https://lkml.kernel.org/r/20230425225107.8525-5-beaub@linux.microsoft.com
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 25 Apr 2023 22:51:06 +0000 (15:51 -0700)]
tracing/user_events: Prevent same address and bit per process
User processes register an address and bit pair for events. If the same
address and bit pair are registered multiple times in the same process,
it can cause undefined behavior when events are enabled/disabled.
When more than one are used, the bit could be turned off by another
event being disabled, while the original event is still enabled.
Prevent undefined behavior by checking the current mm to see if any
event has already been registered for the address and bit pair. Return
EADDRINUSE back to the user process if it's already being used.
Update ftrace self-test to ensure this occurs properly.
Link: https://lkml.kernel.org/r/20230425225107.8525-4-beaub@linux.microsoft.com
Suggested-by: Doug Cook <dcook@linux.microsoft.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 25 Apr 2023 22:51:05 +0000 (15:51 -0700)]
tracing/user_events: Ensure bit is cleared on unregister
If an event is enabled and a user process unregisters user_events, the
bit is left set. Fix this by always clearing the bit in the user process
if unregister is successful.
Update abi self-test to ensure this occurs properly.
Link: https://lkml.kernel.org/r/20230425225107.8525-3-beaub@linux.microsoft.com
Suggested-by: Doug Cook <dcook@linux.microsoft.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 25 Apr 2023 22:51:04 +0000 (15:51 -0700)]
tracing/user_events: Ensure write index cannot be negative
The write index indicates which event the data is for and accesses a
per-file array. The index is passed by user processes during write()
calls as the first 4 bytes. Ensure that it cannot be negative by
returning -EINVAL to prevent out of bounds accesses.
Update ftrace self-test to ensure this occurs properly.
Link: https://lkml.kernel.org/r/20230425225107.8525-2-beaub@linux.microsoft.com
Fixes: 7f5a08c79df3 ("user_events: Add minimal support for trace_event into ftrace")
Reported-by: Doug Cook <dcook@linux.microsoft.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Sergey Senozhatsky [Sat, 15 Apr 2023 10:01:10 +0000 (19:01 +0900)]
seq_buf: Add seq_buf_do_printk() helper
Sometimes we use seq_buf to format a string buffer, which
we then pass to printk(). However, in certain situations
the seq_buf string buffer can get too big, exceeding the
PRINTKRB_RECORD_MAX bytes limit, and causing printk() to
truncate the string.
Add a new seq_buf helper. This helper prints the seq_buf
string buffer line by line, using \n as a delimiter,
rather than passing the whole string buffer to printk()
at once.
Link: https://lkml.kernel.org/r/20230415100110.1419872-1-senozhatsky@chromium.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Wed, 19 Apr 2023 21:41:40 +0000 (14:41 -0700)]
tracing: Fix print_fields() for __dyn_loc/__rel_loc
Both print_fields() and print_array() do not handle if dynamic data ends
at the last byte of the payload for both __dyn_loc and __rel_loc field
types. For __rel_loc, the offset was off by 4 bytes, leading to
incorrect strings and data being printed out. In print_array() the
buffer pos was missed from being advanced, which results in the first
payload byte being used as the offset base instead of the field offset.
Advance __rel_loc offset by 4 to ensure correct offset and advance pos
to the field offset to ensure correct data is displayed when printing
arrays. Change >= to > when checking if data is in-bounds, since it's
valid for dynamic data to include the last byte of the payload.
Example outputs for event format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:__rel_loc char text[]; offset:8; size:4; signed:1;
Output before:
tp_rel_loc: text=<OVERFLOW>
Output after:
tp_rel_loc: text=Test
Link: https://lkml.kernel.org/r/20230419214140.4158-3-beaub@linux.microsoft.com
Fixes: 80a76994b2d8 ("tracing: Add "fields" option to show raw trace event fields")
Reported-by: Doug Cook <dcook@linux.microsoft.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Wed, 19 Apr 2023 21:41:39 +0000 (14:41 -0700)]
tracing/user_events: Set event filter_type from type
Users expect that events can be filtered by the kernel. User events
currently sets all event fields as FILTER_OTHER which limits to binary
filters only. When strings are being used, functionality is reduced.
Use filter_assign_type() to find the most appropriate filter
type for each field in user events to ensure full kernel capabilities.
Link: https://lkml.kernel.org/r/20230419214140.4158-2-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Zheng Yejian [Fri, 14 Apr 2023 07:17:29 +0000 (15:17 +0800)]
ring-buffer: Clearly check null ptr returned by rb_set_head_page()
In error case, 'buffer_page' returned by rb_set_head_page() is NULL,
currently check '&buffer_page->list' is equivalent to check 'buffer_page'
due to 'list' is the first member of 'buffer_page', but suppose it is not
some time, 'head_page' would be wild memory while check would be bypassed.
Link: https://lore.kernel.org/linux-trace-kernel/20230414071729.57312-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt (Google) [Tue, 28 Mar 2023 15:10:57 +0000 (11:10 -0400)]
tracing: Unbreak user events
The user events was added a bit prematurely, and there were a few kernel
developers that had issues with it. The API also needed a bit of work to
make sure it would be stable. It was decided to make user events "broken"
until this was settled. Now it has a new API that appears to be as stable
as it will be without the use of a crystal ball. It's being used within
Microsoft as is, which means the API has had some testing in real world
use cases. It went through many discussions in the bi-weekly tracing
meetings, and there's been no more comments about updates.
I feel this is good to go.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt (Google) [Tue, 28 Mar 2023 19:14:13 +0000 (15:14 -0400)]
tracing/user_events: Use print_format_fields() for trace output
Currently, user events are shown using the "hex" output for "safety"
reasons as one cannot trust user events behaving nicely. But the hex
output is not the only utility for safe outputting of trace events. The
print_event_fields() is just as safe and gives user readable output.
Before:
example-839 [001] ..... 43.222244:
00000000: b1 06 00 00 47 03 00 00 00 00 00 00 ....G.......
example-839 [001] ..... 43.564433:
00000000: b1 06 00 00 47 03 00 00 01 00 00 00 ....G.......
example-839 [001] ..... 43.763917:
00000000: b1 06 00 00 47 03 00 00 02 00 00 00 ....G.......
example-839 [001] ..... 43.967929:
00000000: b1 06 00 00 47 03 00 00 03 00 00 00 ....G.......
After:
example-837 [006] ..... 55.739249: test: count=0x0 (0)
example-837 [006] ..... 111.104784: test: count=0x1 (1)
example-837 [006] ..... 111.268444: test: count=0x2 (2)
example-837 [006] ..... 111.416533: test: count=0x3 (3)
example-837 [006] ..... 111.542859: test: count=0x4 (4)
Link: https://lore.kernel.org/linux-trace-kernel/20230328151413.4770b8d7@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:19 +0000 (16:52 -0700)]
tracing/user_events: Align structs with tabs for readability
Add tabs to make struct members easier to read and unify the style of
the code.
Link: https://lkml.kernel.org/r/20230328235219.203-13-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:18 +0000 (16:52 -0700)]
tracing/user_events: Limit global user_event count
Operators want to be able to ensure enough tracepoints exist on the
system for kernel components as well as for user components. Since there
are only up to 64K events, by default allow up to half to be used by
user events.
Add a kernel sysctl parameter (kernel.user_events_max) to set a global
limit that is honored among all groups on the system. This ensures hard
limits can be setup to prevent user processes from consuming all event
IDs on the system.
Link: https://lkml.kernel.org/r/20230328235219.203-12-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:17 +0000 (16:52 -0700)]
tracing/user_events: Charge event allocs to cgroups
Operators need a way to limit how much memory cgroups use. User events need
to be included into that accounting. Fix this by using GFP_KERNEL_ACCOUNT
for allocations generated by user programs for user_event tracing.
Link: https://lkml.kernel.org/r/20230328235219.203-11-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:16 +0000 (16:52 -0700)]
tracing/user_events: Update documentation for ABI
The ABI for user_events has changed from mmap() based to remote writes.
Update the documentation to reflect these changes, add new section for
unregistering events since lifetime is now tied to tasks instead of
files.
Link: https://lkml.kernel.org/r/20230328235219.203-10-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:15 +0000 (16:52 -0700)]
tracing/user_events: Use write ABI in example
The ABI has changed to use a remote write approach. Update the example
to show the expected use of this new ABI. Also remove debugfs
path and use tracefs to ensure example works in more environments.
Link: https://lkml.kernel.org/r/20230328235219.203-9-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:14 +0000 (16:52 -0700)]
tracing/user_events: Add ABI self-test
Add ABI specific self-test to ensure enablements work in various
scenarios such as fork, VM_CLONE, and basic event enable/disable.
Ensure ABI contracts/limits are also being upheld, such as bit limits
and data size limits.
Link: https://lkml.kernel.org/r/20230328235219.203-8-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:13 +0000 (16:52 -0700)]
tracing/user_events: Update self-tests to write ABI
ABI has been changed to remote writes, update existing test cases to use
this new ABI to ensure existing functionality continues to work.
Link: https://lkml.kernel.org/r/20230328235219.203-7-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:12 +0000 (16:52 -0700)]
tracing/user_events: Add ioctl for disabling addresses
Enablements are now tracked by the lifetime of the task/mm. User
processes need to be able to disable their addresses if tracing is
requested to be turned off. Before unmapping the page would suffice.
However, we now need a stronger contract. Add an ioctl to enable this.
A new flag bit is added, freeing, to user_event_enabler to ensure that
if the event is attempted to be removed while a fault is being handled
that the remove is delayed until after the fault is reattempted.
Link: https://lkml.kernel.org/r/20230328235219.203-6-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:11 +0000 (16:52 -0700)]
tracing/user_events: Fixup enable faults asyncly
When events are enabled within the various tracing facilities, such as
ftrace/perf, the event_mutex is held. As events are enabled pages are
accessed. We do not want page faults to occur under this lock. Instead
queue the fault to a workqueue to be handled in a process context safe
way without the lock.
The enable address is marked faulting while the async fault-in occurs.
This ensures that we don't attempt to fault-in more than is necessary.
Once the page has been faulted in, an address write is re-attempted.
If the page couldn't fault-in, then we wait until the next time the
event is enabled to prevent any potential infinite loops.
Link: https://lkml.kernel.org/r/20230328235219.203-5-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:10 +0000 (16:52 -0700)]
tracing/user_events: Use remote writes for event enablement
As part of the discussions for user_events aligned with user space
tracers, it was determined that user programs should register a aligned
value to set or clear a bit when an event becomes enabled. Currently a
shared page is being used that requires mmap(). Remove the shared page
implementation and move to a user registered address implementation.
In this new model during the event registration from user programs 3 new
values are specified. The first is the address to update when the event
is either enabled or disabled. The second is the bit to set/clear to
reflect the event being enabled. The third is the size of the value at
the specified address.
This allows for a local 32/64-bit value in user programs to support
both kernel and user tracers. As an example, setting bit 31 for kernel
tracers when the event becomes enabled allows for user tracers to use
the other bits for ref counts or other flags. The kernel side updates
the bit atomically, user programs need to also update these values
atomically.
User provided addresses must be aligned on a natural boundary, this
allows for single page checking and prevents odd behaviors such as a
enable value straddling 2 pages instead of a single page. Currently
page faults are only logged, future patches will handle these.
Link: https://lkml.kernel.org/r/20230328235219.203-4-beaub@linux.microsoft.com
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:09 +0000 (16:52 -0700)]
tracing/user_events: Track fork/exec/exit for mm lifetime
During tracefs discussions it was decided instead of requiring a mapping
within a user-process to track the lifetime of memory descriptors we
should hook the appropriate calls. Do this by adding the minimal stubs
required for task fork, exec, and exit. Currently this is just a NOP.
Future patches will implement these calls fully.
Link: https://lkml.kernel.org/r/20230328235219.203-3-beaub@linux.microsoft.com
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Tue, 28 Mar 2023 23:52:08 +0000 (16:52 -0700)]
tracing/user_events: Split header into uapi and kernel
The UAPI parts need to be split out from the kernel parts of user_events
now that other parts of the kernel will reference it. Do so by moving
the existing include/linux/user_events.h into
include/uapi/linux/user_events.h.
Link: https://lkml.kernel.org/r/20230328235219.203-2-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt (Google) [Tue, 28 Mar 2023 18:51:56 +0000 (14:51 -0400)]
tracing: Add "fields" option to show raw trace event fields
The hex, raw and bin formats come from the old PREEMPT_RT patch set
latency tracer. That actually gave real alternatives to reading the ascii
buffer. But they have started to bit rot and they do not give a good
representation of the tracing data.
Add "fields" option that will read the trace event fields and parse the
data from how the fields are defined:
With "fields" = 0 (default)
echo 1 > events/sched/sched_switch/enable
cat trace
<idle>-0 [003] d..2. 540.078653: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/3:1 next_pid=83 next_prio=120
kworker/3:1-83 [003] d..2. 540.078860: sched_switch: prev_comm=kworker/3:1 prev_pid=83 prev_prio=120 prev_state=I ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [003] d..2. 540.206423: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=807 next_prio=120
sshd-807 [003] d..2. 540.206531: sched_switch: prev_comm=sshd prev_pid=807 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [001] d..2. 540.206597: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120
kworker/u16:4-58 [001] d..2. 540.206617: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120
bash-830 [001] d..2. 540.206678: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120
kworker/u16:4-58 [001] d..2. 540.206696: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120
bash-830 [001] d..2. 540.206713: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120
echo 1 > options/fields
<...>-998 [002] d..2. 538.643732: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/2 prev_state=0x20 (32) prev_prio=0x78 (120) prev_pid=0x3e6 (998) prev_comm=trace-cmd
<idle>-0 [001] d..2. 538.643806: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/1
bash-830 [001] d..2. 538.644106: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash
kworker/u16:4-58 [001] d..2. 538.644130: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4
bash-830 [001] d..2. 538.644180: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash
kworker/u16:4-58 [001] d..2. 538.644185: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4
bash-830 [001] d..2. 538.644204: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/1 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash
<idle>-0 [003] d..2. 538.644211: sched_switch: next_prio=0x78 (120) next_pid=0x327 (807) next_comm=sshd prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/3
sshd-807 [003] d..2. 538.644340: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/3 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x327 (807) prev_comm=sshd
It traces the data safely without using the trace print formatting.
Link: https://lore.kernel.org/linux-trace-kernel/20230328145156.497651be@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Ross Zwisler [Mon, 13 Mar 2023 21:17:45 +0000 (15:17 -0600)]
tools/kvm_stat: use canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.
But, from Documentation/trace/ftrace.rst:
Before 4.1, all ftrace tracing control files were within the debugfs
file system, which is typically located at /sys/kernel/debug/tracing.
For backward compatibility, when mounting the debugfs file system,
the tracefs file system will be automatically mounted at:
/sys/kernel/debug/tracing
A comment in kvm_stat still refers to this older debugfs path, so let's
update it to avoid confusion.
Link: https://lkml.kernel.org/r/20230313211746.1541525-3-zwisler@kernel.org
Cc: "Tobin C. Harding" <me@tobin.cc>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Ross Zwisler [Mon, 13 Mar 2023 21:17:44 +0000 (15:17 -0600)]
leaking_addresses: also skip canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.
But, from Documentation/trace/ftrace.rst:
Before 4.1, all ftrace tracing control files were within the debugfs
file system, which is typically located at /sys/kernel/debug/tracing.
For backward compatibility, when mounting the debugfs file system,
the tracefs file system will be automatically mounted at:
/sys/kernel/debug/tracing
scripts/leaking_addresses.pl only skipped this older debugfs path, so
let's add the canonical path as well.
Link: https://lkml.kernel.org/r/20230313211746.1541525-2-zwisler@kernel.org
Cc: "Tobin C. Harding" <me@tobin.cc>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Acked-by: Tycho Andersen <tycho@tycho.pizza>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Ross Zwisler [Mon, 13 Mar 2023 21:17:43 +0000 (15:17 -0600)]
selftests: use canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.
But, from Documentation/trace/ftrace.rst:
Before 4.1, all ftrace tracing control files were within the debugfs
file system, which is typically located at /sys/kernel/debug/tracing.
For backward compatibility, when mounting the debugfs file system,
the tracefs file system will be automatically mounted at:
/sys/kernel/debug/tracing
A few spots in tools/testing/selftests still refer to this older debugfs
path, so let's update them to avoid confusion.
Link: https://lkml.kernel.org/r/20230313211746.1541525-1-zwisler@kernel.org
Cc: "Tobin C. Harding" <me@tobin.cc>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tycho Andersen <tycho@tycho.pizza>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Wed, 1 Feb 2023 15:56:55 +0000 (00:56 +0900)]
docs: tracing: Update fprobe documentation
Update fprobe.rst for
- the private entry_data argument
- the return value of the entry handler
- the nr_rethook_node field.
Link: https://lkml.kernel.org/r/167526701579.433354.3057889264263546659.stgit@mhiramat.roam.corp.google.com
Cc: Florent Revest <revest@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Wed, 1 Feb 2023 15:56:46 +0000 (00:56 +0900)]
lib/test_fprobe: Add a testcase for skipping exit_handler
Add a testcase for skipping exit_handler if entry_handler
returns !0.
Link: https://lkml.kernel.org/r/167526700658.433354.12922388040490848613.stgit@mhiramat.roam.corp.google.com
Cc: Florent Revest <revest@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Wed, 1 Feb 2023 15:56:38 +0000 (00:56 +0900)]
fprobe: Skip exit_handler if entry_handler returns !0
Skip hooking function return and calling exit_handler if the
entry_handler() returns !0.
Link: https://lkml.kernel.org/r/167526699798.433354.10998365726830117303.stgit@mhiramat.roam.corp.google.com
Cc: Florent Revest <revest@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Wed, 1 Feb 2023 15:56:28 +0000 (00:56 +0900)]
lib/test_fprobe: Add a test case for nr_maxactive
Add a test case for nr_maxactive. If the number of active
functions is more than nr_maxactive, it must be skipped.
Link: https://lkml.kernel.org/r/167526698856.433354.4430007340787176666.stgit@mhiramat.roam.corp.google.com
Cc: Florent Revest <revest@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Wed, 1 Feb 2023 15:56:19 +0000 (00:56 +0900)]
fprobe: Add nr_maxactive to specify rethook_node pool size
Add nr_maxactive to specify rethook_node pool size. This means
the maximum number of actively running target functions concurrently
for probing by exit_handler. Note that if the running function is
preempted or sleep, it is still counted as 'active'.
Link: https://lkml.kernel.org/r/167526697917.433354.17779774988245113106.stgit@mhiramat.roam.corp.google.com
Cc: Florent Revest <revest@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Wed, 1 Feb 2023 15:56:10 +0000 (00:56 +0900)]
lib/test_fprobe: Add private entry_data testcases
Add test cases for checking whether private entry_data is
correctly passed or not.
Link: https://lkml.kernel.org/r/167526697074.433354.17790288501657876219.stgit@mhiramat.roam.corp.google.com
Cc: Florent Revest <revest@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Wed, 1 Feb 2023 15:56:01 +0000 (00:56 +0900)]
fprobe: Pass entry_data to handlers
Pass the private entry_data to the entry and exit handlers so that
they can share the context data, something like saved function
arguments etc.
User must specify the private entry_data size by @entry_data_size
field before registering the fprobe.
Link: https://lkml.kernel.org/r/167526696173.433354.17408372048319432574.stgit@mhiramat.roam.corp.google.com
Cc: Florent Revest <revest@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt (Google) [Tue, 24 Jan 2023 14:56:53 +0000 (09:56 -0500)]
ftrace: Show a list of all functions that have ever been enabled
When debugging a crash that appears to be related to ftrace, but not for
sure, it is useful to know if a function was ever enabled by ftrace or
not. It could be that a BPF program was attached to it, or possibly a live
patch.
We are having crashes in the field where this information is not always
known. But having ftrace set a flag if a function has ever been attached
since boot up helps tremendously in trying to know if a crash had to do
with something using ftrace.
For analyzing crashes, the use of a kdump image can have access to the
flags. When looking at issues where the kernel did not panic, the
touched_functions file can simply be used.
Link: https://lore.kernel.org/linux-trace-kernel/20230124095653.6fd1640e@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Chris Li <chriscli@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Uros Bizjak [Sun, 5 Mar 2023 15:55:32 +0000 (16:55 +0100)]
ring_buffer: Use try_cmpxchg instead of cmpxchg
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old.
x86 CMPXCHG instruction returns success in ZF flag, so this change
saves a compare after cmpxchg (and related move instruction in
front of cmpxchg).
Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
fails. There is no need to re-read the value in the loop.
No functional change intended.
Link: https://lkml.kernel.org/r/20230305155532.5549-4-ubizjak@gmail.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Acked-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Uros Bizjak [Sun, 5 Mar 2023 15:55:31 +0000 (16:55 +0100)]
ring_buffer: Change some static functions to bool
The return values of some functions are of boolean type. Change the
type of these function to bool and adjust their return values. Also
change type of some internal varibles to bool.
No functional change intended.
Link: https://lkml.kernel.org/r/20230305155532.5549-3-ubizjak@gmail.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Uros Bizjak [Sun, 5 Mar 2023 15:55:30 +0000 (16:55 +0100)]
ring_buffer: Change some static functions to void
The results of some static functions are not used. Change the
type of these function to void and remove unnecessary returns.
No functional change intended.
Link: https://lkml.kernel.org/r/20230305155532.5549-2-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Mark Rutland [Tue, 21 Mar 2023 14:04:24 +0000 (15:04 +0100)]
ftrace: selftest: remove broken trace_direct_tramp
The ftrace selftest code has a trace_direct_tramp() function which it
uses as a direct call trampoline. This happens to work on x86, since the
direct call's return address is in the usual place, and can be returned
to via a RET, but in general the calling convention for direct calls is
different from regular function calls, and requires a trampoline written
in assembly.
On s390, regular function calls place the return address in %r14, and an
ftrace patch-site in an instrumented function places the trampoline's
return address (which is within the instrumented function) in %r0,
preserving the original %r14 value in-place. As a regular C function
will return to the address in %r14, using a C function as the trampoline
results in the trampoline returning to the caller of the instrumented
function, skipping the body of the instrumented function.
Note that the s390 issue is not detcted by the ftrace selftest code, as
the instrumented function is trivial, and returning back into the caller
happens to be equivalent.
On arm64, regular function calls place the return address in x30, and
an ftrace patch-site in an instrumented function saves this into r9
and places the trampoline's return address (within the instrumented
function) in x30. A regular C function will return to the address in
x30, but will not restore x9 into x30. Consequently, using a C function
as the trampoline results in returning to the trampoline's return
address having corrupted x30, such that when the instrumented function
returns, it will return back into itself.
To avoid future issues in this area, remove the trace_direct_tramp()
function, and require that each architecture with direct calls provides
a stub trampoline, named ftrace_stub_direct_tramp. This can be written
to handle the architecture's trampoline calling convention, and in
future could be used elsewhere (e.g. in the ftrace ops sample, to
measure the overhead of direct calls), so we may as well always build it
in.
Link: https://lkml.kernel.org/r/20230321140424.345218-8-revest@chromium.org
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Florent Revest [Tue, 21 Mar 2023 14:04:23 +0000 (15:04 +0100)]
ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS
Direct called trampolines can be called in two ways:
- either from the ftrace callsite. In this case, they do not access any
struct ftrace_regs nor pt_regs
- Or, if a ftrace ops is also attached, from the end of a ftrace
trampoline. In this case, the call_direct_funcs ops is in charge of
setting the direct call trampoline's address in a struct ftrace_regs
Since:
commit
9705bc709604 ("ftrace: pass fregs to arch_ftrace_set_direct_caller()")
The later case no longer requires a full pt_regs. It only needs a struct
ftrace_regs so DIRECT_CALLS can work with both WITH_ARGS or WITH_REGS.
With architectures like arm64 already abandoning WITH_REGS in favor of
WITH_ARGS, it's important to have DIRECT_CALLS work WITH_ARGS only.
Link: https://lkml.kernel.org/r/20230321140424.345218-7-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Florent Revest [Tue, 21 Mar 2023 14:04:22 +0000 (15:04 +0100)]
ftrace: Store direct called addresses in their ops
All direct calls are now registered using the register_ftrace_direct API
so each ops can jump to only one direct-called trampoline.
By storing the direct called trampoline address directly in the ops we
can save one hashmap lookup in the direct call ops and implement arm64
direct calls on top of call ops.
Link: https://lkml.kernel.org/r/20230321140424.345218-6-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Florent Revest [Tue, 21 Mar 2023 14:04:21 +0000 (15:04 +0100)]
ftrace: Rename _ftrace_direct_multi APIs to _ftrace_direct APIs
Now that the original _ftrace_direct APIs are gone, the "_multi"
suffixes only add confusion.
Link: https://lkml.kernel.org/r/20230321140424.345218-5-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Florent Revest [Tue, 21 Mar 2023 14:04:20 +0000 (15:04 +0100)]
ftrace: Remove the legacy _ftrace_direct API
This API relies on a single global ops, used for all direct calls
registered with it. However, to implement arm64 direct calls, we need
each ops to point to a single direct call trampoline.
Link: https://lkml.kernel.org/r/20230321140424.345218-4-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Florent Revest [Tue, 21 Mar 2023 14:04:19 +0000 (15:04 +0100)]
ftrace: Replace uses of _ftrace_direct APIs with _ftrace_direct_multi
The _multi API requires that users keep their own ops but can enforce
that an op is only associated to one direct call.
Link: https://lkml.kernel.org/r/20230321140424.345218-3-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Florent Revest [Tue, 21 Mar 2023 14:04:18 +0000 (15:04 +0100)]
ftrace: Let unregister_ftrace_direct_multi() call ftrace_free_filter()
A common pattern when using the ftrace_direct_multi API is to unregister
the ops and also immediately free its filter. We've noticed it's very
easy for users to miss calling ftrace_free_filter().
This adds a "free_filters" argument to unregister_ftrace_direct_multi()
to both remind the user they should free filters and also to make their
life easier.
Link: https://lkml.kernel.org/r/20230321140424.345218-2-revest@chromium.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Sun, 19 Mar 2023 20:27:55 +0000 (13:27 -0700)]
Linux 6.3-rc3
Linus Torvalds [Sun, 19 Mar 2023 17:46:02 +0000 (10:46 -0700)]
Merge tag 'trace-v6.3-rc2' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix setting affinity of hwlat threads in containers
Using sched_set_affinity() has unwanted side effects when being
called within a container. Use set_cpus_allowed_ptr() instead
- Fix per cpu thread management of the hwlat tracer:
- Do not start per_cpu threads if one is already running for the CPU
- When starting per_cpu threads, do not clear the kthread variable
as it may already be set to running per cpu threads
- Fix return value for test_gen_kprobe_cmd()
On error the return value was overwritten by being set to the result
of the call from kprobe_event_delete(), which would likely succeed,
and thus have the function return success
- Fix splice() reads from the trace file that was broken by commit
36e2c7421f02 ("fs: don't allow splice read/write without explicit
ops")
- Remove obsolete and confusing comment in ring_buffer.c
The original design of the ring buffer used struct page flags for
tricks to optimize, which was shortly removed due to them being
tricks. But a comment for those tricks remained
- Set local functions and variables to static
* tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
ring-buffer: remove obsolete comment for free_buffer_page()
tracing: Make splice_read available again
ftrace: Set direct_ops storage-class-specifier to static
trace/hwlat: Do not start per-cpu thread if it is already running
trace/hwlat: Do not wipe the contents of per-cpu thread data
tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static
tracing: Fix wrong return in kprobe_event_gen_test.c
Costa Shulyupin [Thu, 16 Mar 2023 14:45:35 +0000 (16:45 +0200)]
tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
There is a problem with the behavior of hwlat in a container,
resulting in incorrect output. A warning message is generated:
"cpumask changed while in round-robin mode, switching to mode none",
and the tracing_cpumask is ignored. This issue arises because
the kernel thread, hwlatd, is not a part of the container, and
the function sched_setaffinity is unable to locate it using its PID.
Additionally, the task_struct of hwlatd is already known.
Ultimately, the function set_cpus_allowed_ptr achieves
the same outcome as sched_setaffinity, but employs task_struct
instead of PID.
Test case:
# cd /sys/kernel/tracing
# echo 0 > tracing_on
# echo round-robin > hwlat_detector/mode
# echo hwlat > current_tracer
# unshare --fork --pid bash -c 'echo 1 > tracing_on'
# dmesg -c
Actual behavior:
[573502.809060] hwlat_detector: cpumask changed while in round-robin mode, switching to mode none
Link: https://lore.kernel.org/linux-trace-kernel/20230316144535.1004952-1-costa.shul@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 0330f7aa8ee63 ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Vlastimil Babka [Wed, 15 Mar 2023 14:24:46 +0000 (15:24 +0100)]
ring-buffer: remove obsolete comment for free_buffer_page()
The comment refers to mm/slob.c which is being removed. It comes from
commit
ed56829cb319 ("ring_buffer: reset buffer page when freeing") and
according to Steven the borrowed code was a page mapcount and mapping
reset, which was later removed by commit
e4c2ce82ca27 ("ring_buffer:
allocate buffer page pointer"). Thus the comment is not accurate anyway,
remove it.
Link: https://lore.kernel.org/linux-trace-kernel/20230315142446.27040-1-vbabka@suse.cz
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Reported-by: Mike Rapoport <mike.rapoport@gmail.com>
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Fixes: e4c2ce82ca27 ("ring_buffer: allocate buffer page pointer")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Sung-hun Kim [Tue, 14 Mar 2023 01:37:07 +0000 (10:37 +0900)]
tracing: Make splice_read available again
Since the commit
36e2c7421f02 ("fs: don't allow splice read/write
without explicit ops") is applied to the kernel, splice() and
sendfile() calls on the trace file (/sys/kernel/debug/tracing
/trace) return EINVAL.
This patch restores these system calls by initializing splice_read
in file_operations of the trace file. This patch only enables such
functionalities for the read case.
Link: https://lore.kernel.org/linux-trace-kernel/20230314013707.28814-1-sfoon.kim@samsung.com
Cc: stable@vger.kernel.org
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: Sung-hun Kim <sfoon.kim@samsung.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Sun, 19 Mar 2023 17:09:58 +0000 (10:09 -0700)]
Merge tag 'tty-6.3-rc3' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.3-rc3 to resolve
some reported issues.
They include:
- 8250 driver Kconfig issue pointed out by you that showed up in -rc1
- qcom-geni serial driver fixes
- various 8250 driver fixes for reported problems
- fsl_lpuart driver fixes
- serdev fix for regression in -rc1
- vt.c bugfix
All have been in linux-next for over a week with no reported problems"
* tag 'tty-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: vt: protect KD_FONT_OP_GET_TALL from unbound access
serial: qcom-geni: drop bogus uart_write_wakeup()
serial: qcom-geni: fix mapping of empty DMA buffer
serial: qcom-geni: fix DMA mapping leak on shutdown
serial: qcom-geni: fix console shutdown hang
serdev: Set fwnode for serdev devices
tty: serial: fsl_lpuart: fix race on RX DMA shutdown
serial: 8250_pci1xxxx: Disable SERIAL_8250_PCI1XXXX config by default
serial: 8250_fsl: fix handle_irq locking
serial: 8250_em: Fix UART port type
serial: 8250: ASPEED_VUART: select REGMAP instead of depending on it
tty: serial: fsl_lpuart: skip waiting for transmission complete when UARTCTRL_SBK is asserted
Revert "tty: serial: fsl_lpuart: adjust SERIAL_FSL_LPUART_CONSOLE config dependency"
Linus Torvalds [Sun, 19 Mar 2023 17:04:58 +0000 (10:04 -0700)]
Merge tag 'char-misc-6.3-rc3' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a few small char/misc/other driver subsystem patches to
resolve reported problems for 6.3-rc3.
Included in here are:
- Interconnect driver fixes for reported problems
- Memory driver fixes for reported problems
- nvmem core fix
- firmware driver fix for reported problem
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (23 commits)
memory: tegra30-emc: fix interconnect registration race
memory: tegra20-emc: fix interconnect registration race
memory: tegra124-emc: fix interconnect registration race
memory: tegra: fix interconnect registration race
interconnect: exynos: drop redundant link destroy
interconnect: exynos: fix registration race
interconnect: exynos: fix node leak in probe PM QoS error path
interconnect: qcom: msm8974: fix registration race
interconnect: qcom: rpmh: fix registration race
interconnect: qcom: rpmh: fix probe child-node error handling
interconnect: qcom: rpm: fix registration race
nvmem: core: return -ENOENT if nvmem cell is not found
firmware: xilinx: don't make a sleepable memory allocation from an atomic context
interconnect: qcom: rpm: fix probe child-node error handling
interconnect: qcom: osm-l3: fix registration race
interconnect: imx: fix registration race
interconnect: fix provider registration API
interconnect: fix icc_provider_del() error handling
interconnect: fix mem leak when freeing nodes
interconnect: qcom: qcm2290: Fix MASTER_SNOC_BIMC_NRT
...
Linus Torvalds [Sun, 19 Mar 2023 16:57:53 +0000 (09:57 -0700)]
Merge tag 'ras_urgent_for_v6.3_rc3' of git://git./linux/kernel/git/tip/tip
Pull RAS fix from Borislav Petkov:
- Flush out logged errors immediately after MCA banks configuration
changes over sysfs have been done instead of waiting until something
else triggers the workqueue later - another error or the polling
interval cycle is reached
* tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Make sure logged MCEs are processed after sysfs update
Linus Torvalds [Sun, 19 Mar 2023 16:47:55 +0000 (09:47 -0700)]
Merge tag 'perf_urgent_for_v6.3_rc3' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Check whether sibling events have been deactivated before adding them
to groups
- Update the proper event time tracking variable depending on the event
type
- Fix a memory overwrite issue due to using the wrong function argument
when outputting perf events
* tag 'perf_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix check before add_event_to_groups() in perf_group_detach()
perf: fix perf_event_context->time
perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output
Linus Torvalds [Sun, 19 Mar 2023 16:43:41 +0000 (09:43 -0700)]
Merge tag 'x86_urgent_for_v6.3_rc3' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"There's a little bit more 'movement' in there for my taste but it
needs to happen and should make the code better after it.
- Check cmdline_find_option()'s return value before further
processing
- Clear temporary storage in the resctrl code to prevent access to an
unexistent MSR
- Add a simple throttling mechanism to protect the hypervisor from
potentially malicious SEV guests issuing requests in rapid
succession.
In order to not jeopardize the sanity of everyone involved in
maintaining this code, the request issuing side has received a
cleanup, split in more or less trivial, small and digestible
pieces. Otherwise, the code was threatening to become an
unmaintainable mess.
Therefore, that cleanup is marked indirectly also for stable so
that there's no differences between the upstream code and the
stable variant when it comes down to backporting more there"
* tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix use of uninitialized buffer in sme_enable()
x86/resctrl: Clear staged_config[] before and after it is used
virt/coco/sev-guest: Add throttling awareness
virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case
virt/coco/sev-guest: Do some code style cleanups
virt/coco/sev-guest: Carve out the request issuing logic into a helper
virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request()
virt/coco/sev-guest: Simplify extended guest request handling
virt/coco/sev-guest: Check SEV_SNP attribute at probe time
Linus Torvalds [Sun, 19 Mar 2023 16:38:26 +0000 (09:38 -0700)]
Merge tag 'ext4_for_linus_urgent' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fix from Ted Ts'o:
"Fix a double unlock bug on an error path in ext4, found by smatch and
syzkaller"
* tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix possible double unlock when moving a directory
Tom Rix [Sat, 11 Mar 2023 13:51:13 +0000 (08:51 -0500)]
ftrace: Set direct_ops storage-class-specifier to static
smatch reports this warning
kernel/trace/ftrace.c:2594:19: warning:
symbol 'direct_ops' was not declared. Should it be static?
The variable direct_ops is only used in ftrace.c, so it should be static
Link: https://lore.kernel.org/linux-trace-kernel/20230311135113.711824-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tero Kristo [Fri, 10 Mar 2023 10:04:51 +0000 (12:04 +0200)]
trace/hwlat: Do not start per-cpu thread if it is already running
The hwlatd tracer will end up starting multiple per-cpu threads with
the following script:
#!/bin/sh
cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo hwlat > current_tracer
echo per-cpu > hwlat_detector/mode
echo 100000 > hwlat_detector/width
echo 200000 > hwlat_detector/window
echo 1 > tracing_on
To fix the issue, check if the hwlatd thread for the cpu is already
running, before starting a new one. Along with the previous patch, this
avoids running multiple instances of the same CPU thread on the system.
Link: https://lore.kernel.org/all/20230302113654.2984709-1-tero.kristo@linux.intel.com/
Link: https://lkml.kernel.org/r/20230310100451.3948583-3-tero.kristo@linux.intel.com
Cc: stable@vger.kernel.org
Fixes: f46b16520a087 ("trace/hwlat: Implement the per-cpu mode")
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tero Kristo [Fri, 10 Mar 2023 10:04:50 +0000 (12:04 +0200)]
trace/hwlat: Do not wipe the contents of per-cpu thread data
Do not wipe the contents of the per-cpu kthread data when starting the
tracer, as this will completely forget about already running instances
and can later start new additional per-cpu threads.
Link: https://lore.kernel.org/all/20230302113654.2984709-1-tero.kristo@linux.intel.com/
Link: https://lkml.kernel.org/r/20230310100451.3948583-2-tero.kristo@linux.intel.com
Cc: stable@vger.kernel.org
Fixes: f46b16520a087 ("trace/hwlat: Implement the per-cpu mode")
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tom Rix [Thu, 9 Mar 2023 15:04:14 +0000 (10:04 -0500)]
tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static
smatch reports several similar warnings
kernel/trace/trace_osnoise.c:220:1: warning:
symbol '__pcpu_scope_per_cpu_osnoise_var' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:243:1: warning:
symbol '__pcpu_scope_per_cpu_timerlat_var' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:335:14: warning:
symbol 'interface_lock' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:2242:5: warning:
symbol 'timerlat_min_period' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:2243:5: warning:
symbol 'timerlat_max_period' was not declared. Should it be static?
These variables are only used in trace_osnoise.c, so it should be static
Link: https://lore.kernel.org/linux-trace-kernel/20230309150414.4036764-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Anton Gusev [Tue, 31 Jan 2023 07:58:18 +0000 (10:58 +0300)]
tracing: Fix wrong return in kprobe_event_gen_test.c
Overwriting the error code with the deletion result may cause the
function to return 0 despite encountering an error. Commit
b111545d26c0
("tracing: Remove the useless value assignment in
test_create_synth_event()") solves a similar issue by
returning the original error code, so this patch does the same.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://lore.kernel.org/linux-trace-kernel/20230131075818.5322-1-aagusev@ispras.ru
Signed-off-by: Anton Gusev <aagusev@ispras.ru>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Sat, 18 Mar 2023 23:01:34 +0000 (16:01 -0700)]
Merge tag 'fbdev-for-6.3-rc3' of git://git./linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
"The majority of lines changed is due to a code style cleanup in the
pnmtologo helper program.
Arnd removed the omap1 osk driver and the SIS fb driver is now
orphaned.
Other than that it's the usual bunch of small fixes and cleanups, e.g.
prevent possible divide-by-zero in various fb drivers if the pixclock
is zero and various conversions to devm_platform*() and of_property*()
functions:
- Drop omap1 osk driver
- Various potential divide by zero pixclock fixes
- Add pixelclock and fb_check_var() to stifb
- Code style cleanups and indenting fixes"
* tag 'fbdev-for-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: Use of_property_present() for testing DT property presence
fbdev: au1200fb: Fix potential divide by zero
fbdev: lxfb: Fix potential divide by zero
fbdev: intelfb: Fix potential divide by zero
fbdev: nvidia: Fix potential divide by zero
fbdev: stifb: Provide valid pixelclock and add fb_check_var() checks
fbdev: omapfb: remove omap1 osk driver
fbdev: xilinxfb: Use devm_platform_get_and_ioremap_resource()
fbdev: wm8505fb: Use devm_platform_ioremap_resource()
fbdev: pxa3xx-gcu: Use devm_platform_get_and_ioremap_resource()
fbdev: Use of_property_read_bool() for boolean properties
fbdev: clps711x-fb: Use devm_platform_get_and_ioremap_resource()
fbdev: tgafb: Fix potential divide by zero
MAINTAINERS: orphan SIS FRAMEBUFFER DRIVER
fbdev: omapfb: cleanup inconsistent indentation
drivers: video: logo: add SPDX comment, remove GPL notice in pnmtologo.c
drivers: video: logo: fix code style issues in pnmtologo.c
Linus Torvalds [Sat, 18 Mar 2023 18:49:48 +0000 (11:49 -0700)]
Merge tag 'kbuild-fixes-v6.3' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Exclude kallsyms_seqs_of_names from kallsyms to fix build error
- Fix 'make kernelrelease' for external module builds
- Get the Debian source package compilable again
- Fix the wrong uname when Debian packages are built with the
KDEB_PKGVERSION option
- Fix superfluous CROSS_COMPILE when building Debian packages
- Fix RPM package build error when KCONFIG_CONFIG is set
- Use 'git archive' for creating source tarballs
- Remove the scripts/list-gitignored tool
* tag 'kbuild-fixes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: use git-archive for source package creation
kbuild: rpm-pkg: move source components to rpmbuild/SOURCES
kbuild: deb-pkg: use dh_listpackages to know enabled packages
kbuild: deb-pkg: split image and debug objects staging out into functions
kbuild: deb-pkg: set CROSS_COMPILE only when undefined
kbuild: deb-pkg: do not take KERNELRELEASE from the source version
kbuild: deb-pkg: make debian source package working again
Makefile: Make kernelrelease target work with M=
kconfig: Update config changed flag before calling callback
kallsyms: add kallsyms_seqs_of_names to list of special symbols
Linus Torvalds [Sat, 18 Mar 2023 18:33:44 +0000 (11:33 -0700)]
Merge tag 'hwmon-for-v6.3-rc3' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- ltc2992, adm1266: Set missing can_sleep flag
- tmp512/tmp513: Drop of_match_ptr for ID table to fix build with
!CONFIG_OF
- ucd90320: Fix back-to-back access problem
- ina3221: Fix bad error return from probe function
- xgene: Fix use-after-free bug in remove function
- adt7475: Fix hysteresis register bit masks, and fix association of
'smoothing' attributes
* tag 'hwmon-for-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ltc2992) Set `can_sleep` flag for GPIO chip
hwmon: (adm1266) Set `can_sleep` flag for GPIO chip
hwmon: tmp512: drop of_match_ptr for ID table
hwmon: (ucd90320) Add minimum delay between bus accesses
hwmon: (ina3221) return prober error code
hwmon: (xgene) Fix use after free bug in xgene_hwmon_remove due to race condition
hwmon: (adt7475) Fix masking of hysteresis registers
hwmon: (adt7475) Display smoothing attributes in correct order
Linus Torvalds [Sat, 18 Mar 2023 18:25:39 +0000 (11:25 -0700)]
Merge tag 'ata-6.3-rc3' of git://git./linux/kernel/git/dlemoal/libata
Pull ata fixes from Damien Le Moal:
- Two fixes from Ondrej for the pata_parport driver to address an issue
with error handling during drive connection and to fix memory leaks
in case of errors during initialization and when disconnecting a
device.
* tag 'ata-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_parport: fix memory leaks
ata: pata_parport: fix parport release without claim
Linus Torvalds [Fri, 17 Mar 2023 20:51:17 +0000 (13:51 -0700)]
media: m5mols: fix off-by-one loop termination error
The __find_restype() function loops over the m5mols_default_ffmt[]
array, and the termination condition ends up being wrong: instead of
stopping when the iterator becomes the size of the array it traverses,
it stops after it has already overshot the array.
Now, in practice this doesn't likely matter, because the code will
always find the entry it looks for, and will thus return early and never
hit that last extra iteration.
But it turns out that clang will unroll the loop fully, because it has
only two iterations (well, three due to the off-by-one bug), and then
clang will end up just giving up in the middle of the loop unrolling
when it notices that the code walks past the end of the array.
And that made 'objtool' very unhappy indeed, because the generated code
just falls off the edge of the universe, and ends up falling through to
the next function, causing this warning:
drivers/media/i2c/m5mols/m5mols.o: warning: objtool: m5mols_set_fmt() falls through to next function m5mols_get_frame_desc()
Fix the loop ending condition.
Reported-by: Jens Axboe <axboe@kernel.dk>
Analyzed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Analyzed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/linux-block/CAHk-=wgTSdKYbmB1JYM5vmHMcD9J9UZr0mn7BOYM_LudrP+Xvw@mail.gmail.com/
Fixes: bc125106f8af ("[media] Add support for M-5MOLS 8 Mega Pixel camera ISP")
Cc: HeungJun, Kim <riverful.kim@samsung.com>
Cc: Sylwester Nawrocki <s.nawrocki@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Theodore Ts'o [Sat, 18 Mar 2023 01:53:52 +0000 (21:53 -0400)]
ext4: fix possible double unlock when moving a directory
Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory")
Link: https://lore.kernel.org/r/5efbe1b9-ad8b-4a4f-b422-24824d2b775c@kili.mountain
Reported-by: Dan Carpenter <error27@gmail.com>
Reported-by: syzbot+0c73d1d8b952c5f3d714@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Fri, 17 Mar 2023 20:31:16 +0000 (13:31 -0700)]
Merge tag 'net-6.3-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wifi and ipsec.
A little more changes than usual, but it's pretty normal for us that
the rc3/rc4 PRs are oversized as people start testing in earnest.
Possibly an extra boost from people deploying the 6.1 LTS but that's
more of an unscientific hunch.
Current release - regressions:
- phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()
- virtio: vsock: don't use skbuff state to account credit
- virtio: vsock: don't drop skbuff on copy failure
- virtio_net: fix page_to_skb() miscalculating the memory size
Current release - new code bugs:
- eth: correct xdp_features after device reconfig
- wifi: nl80211: fix the puncturing bitmap policy
- net/mlx5e: flower:
- fix raw counter initialization
- fix missing error code
- fix cloned flow attribute
- ipa:
- fix some register validity checks
- fix a surprising number of bad offsets
- kill FILT_ROUT_CACHE_CFG IPA register
Previous releases - regressions:
- tcp: fix bind() conflict check for dual-stack wildcard address
- veth: fix use after free in XDP_REDIRECT when skb headroom is small
- ipv4: fix incorrect table ID in IOCTL path
- ipvlan: make skb->skb_iif track skb->dev for l3s mode
- mptcp:
- fix possible deadlock in subflow_error_report
- fix UaFs when destroying unaccepted and listening sockets
- dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290
Previous releases - always broken:
- tcp: tcp_make_synack() can be called from process context, don't
assume preemption is disabled when updating stats
- netfilter: correct length for loading protocol registers
- virtio_net: add checking sq is full inside xdp xmit
- bonding: restore IFF_MASTER/SLAVE flags on bond enslave Ethertype
change
- phy: nxp-c45-tja11xx: fix MII_BASIC_CONFIG_REV bit number
- eth: i40e: fix crash during reboot when adapter is in recovery mode
- eth: ice: avoid deadlock on rtnl lock when auxiliary device
plug/unplug meets bonding
- dsa: mt7530:
- remove now incorrect comment regarding port 5
- set PLL frequency and trgmii only when trgmii is used
- eth: mtk_eth_soc: reset PCS state when changing interface types
Misc:
- ynl: another license adjustment
- move the TCA_EXT_WARN_MSG attribute for tc action"
* tag 'net-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits)
selftests: bonding: add tests for ether type changes
bonding: restore bond's IFF_SLAVE flag if a non-eth dev enslave fails
bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type change
net: renesas: rswitch: Fix GWTSDIE register handling
net: renesas: rswitch: Fix the output value of quote from rswitch_rx()
ethernet: sun: add check for the mdesc_grab()
net: ipa: fix some register validity checks
net: ipa: kill FILT_ROUT_CACHE_CFG IPA register
net: ipa: add two missing declarations
net: ipa: reg: include <linux/bug.h>
net: xdp: don't call notifiers during driver init
net/sched: act_api: add specific EXT_WARN_MSG for tc action
Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"
net: dsa: microchip: fix RGMII delay configuration on KSZ8765/KSZ8794/KSZ8795
ynl: make the tooling check the license
ynl: broaden the license even more
tools: ynl: make definitions optional again
hsr: ratelimit only when errors are printed
qed/qed_mng_tlv: correctly zero out ->min instead of ->hour
selftests: net: devlink_port_split.py: skip test if no suitable device available
...
Linus Torvalds [Fri, 17 Mar 2023 18:20:27 +0000 (11:20 -0700)]
Merge tag 'block-6.3-2023-03-16' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"A bit bigger than usual, as the NVMe pull request missed last weeks
submission. In detail:
- NVMe pull request via Christoph:
- Avoid potential UAF in nvmet_req_complete (Damien Le Moal)
- More quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
- Fix a memory leak in the nvme-pci probe teardown path
(Irvin Cote)
- Repair the MAINTAINERS entry (Lukas Bulwahn)
- Fix handling single range discard request (Ming Lei)
- Show more opcode names in trace events (Minwoo Im)
- Fix nvme-tcp timeout reporting (Sagi Grimberg)
- MD pull request via Song:
- Two fixes for old issues (Neil)
- Resource leak in device stopping (Xiao)
- Bio based device stats fix (Yu)
- Kill unused CONFIG_BLOCK_COMPAT (Lukas)
- sunvdc missing mdesc_grab() failure check (Liang)
- Fix for reversal of request ordering upon issue for certain cases
(Jan)
- null_blk timeout fixes (Damien)
- Loop use-after-free fix (Bart)
- blk-mq SRCU fix for BLK_MQ_F_BLOCKING devices (Chris)"
* tag 'block-6.3-2023-03-16' of git://git.kernel.dk/linux:
block: remove obsolete config BLOCK_COMPAT
md: select BLOCK_LEGACY_AUTOLOAD
block: count 'ios' and 'sectors' when io is done for bio-based device
block: sunvdc: add check for mdesc_grab() returning NULL
nvmet: avoid potential UAF in nvmet_req_complete()
nvme-trace: show more opcode names
nvme-tcp: add nvme-tcp pdu size build protection
nvme-tcp: fix opcode reporting in the timeout handler
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
nvme-pci: fixing memory leak in probe teardown path
nvme: fix handling single range discard request
MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
block: null_blk: cleanup null_queue_rq()
block: null_blk: Fix handling of fake timeout request
blk-mq: fix "bad unlock balance detected" on q->srcu in __blk_mq_run_dispatch_ops
loop: Fix use-after-free issues
block: do not reverse request order when flushing plug list
md: avoid signed overflow in slot_store()
md: Free resources in __md_stop
Linus Torvalds [Fri, 17 Mar 2023 18:12:07 +0000 (11:12 -0700)]
Merge tag 'io_uring-6.3-2023-03-16' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- When PF_NO_SETAFFINITY was removed for io-wq threads, we kind of
forgot about the SQPOLL thread. Remove it there as well, there's even
less of a reason to set it there (Michal)
- Fixup a confusing 'ret' setting (Li)
- When MSG_RING is used to send a direct descriptor to another ring,
it's possible to have it allocate it on the target ring rather than
provide a specific index for it. If this is done, return the chosen
value in the CQE, like we would've done locally (Pavel)
- Fix a regression in this series on huge page bvec collapsing (Pavel)
* tag 'io_uring-6.3-2023-03-16' of git://git.kernel.dk/linux:
io_uring/rsrc: fix folio accounting
io_uring/msg_ring: let target know allocated index
io_uring: rsrc: Optimize return value variable 'ret'
io_uring/sqpoll: Do not set PF_NO_SETAFFINITY on sqpoll threads
Linus Torvalds [Fri, 17 Mar 2023 18:02:26 +0000 (11:02 -0700)]
Merge tag 'pm-6.3-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix an error code path issue in a cpuidle driver and make the
sleepgraph utility more robust against unexpected input.
Specifics:
- Fix the psci_pd_init_topology() failure path in the PSCI cpuidle
driver (Shawn Guo)
- Modify the sleepgraph utility so it does not crash on binary data
in device names (Todd Brandt)"
* tag 'pm-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
pm-graph: sleepgraph: Avoid crashing on binary data in device names
cpuidle: psci: Iterate backwards over list in psci_pd_remove()
Linus Torvalds [Fri, 17 Mar 2023 17:57:09 +0000 (10:57 -0700)]
Merge tag 'acpi-6.3-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These add some new quirks, fix PPTT handling, fix an ACPI utility and
correct a mistake in the ACPI documentation.
Specifics:
- Fix ACPI PPTT handling to avoid sleep in the atomic context when it
is not present (Sudeep Holla)
- Add 'backlight=native' DMI quirk for Dell Vostro 15 3535 to the
ACPI video driver (Chia-Lin Kao)
- Add ACPI quirks for I2C device enumeration on Lenovo Yoga Book X90
and Acer Iconia One 7 B1-750 (Hans de Goede)
- Fix handling of invalid command line option values in the ACPI
pfrut utility (Chen Yu)
- Fix references to I2C device data type in the ACPI documentation
for device enumeration (Andy Shevchenko)"
* tag 'acpi-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range
ACPI: PPTT: Fix to avoid sleep in the atomic context when PPTT is absent
ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Book X90
ACPI: x86: Add skip i2c clients quirk for Acer Iconia One 7 B1-750
ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helper
ACPI: video: Add backlight=native DMI quirk for Dell Vostro 15 3535
ACPI: docs: enumeration: Correct reference to the I²C device data type
Linus Torvalds [Fri, 17 Mar 2023 17:51:14 +0000 (10:51 -0700)]
Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux
Pull turbostat fweaks and fixes from Len Brown:
"Leprechaun sized fixes and tweaks touching only turbostat.
'Keeping happy users happy since 2010'"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 2023.03.17
tools/power turbostat: fix decoding of HWP_STATUS
tools/power turbostat: Introduce support for EMR
tools/power turbostat: remove stray newlines from warn/warnx strings
tools/power turbostat: Fix /dev/cpu_dma_latency warnings
tools/power turbostat: Provide better debug messages for failed capabilities accesses
tools/power turbostat: update dump of SECONDARY_TURBO_RATIO_LIMIT
Linus Torvalds [Fri, 17 Mar 2023 17:45:49 +0000 (10:45 -0700)]
Merge tag 'for-linus-6.3-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- cleanup for xen time handling
- enable the VGA console in a Xen PVH dom0
- cleanup in the xenfs driver
* tag 'for-linus-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: remove unnecessary (void*) conversions
x86/PVH: obtain VGA console info in Dom0
x86/xen/time: cleanup xen_tsc_safe_clocksource
xen: update arch/x86/include/asm/xen/cpuid.h
Linus Torvalds [Fri, 17 Mar 2023 17:33:33 +0000 (10:33 -0700)]
Merge tag 'riscv-for-linus-6.3-rc3' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- fixes to the ASID allocator to avoid leaking stale mappings between
tasks
- fix the vmalloc fault handler to tolerate huge pages
* tag 'riscv-for-linus-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: mm: Support huge page in vmalloc_fault()
riscv: asid: Fixup stale TLB entry cause application crash
Revert "riscv: mm: notify remote harts about mmu cache updates"
Linus Torvalds [Fri, 17 Mar 2023 17:15:53 +0000 (10:15 -0700)]
Merge tag 's390-6.3-3' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Update defconfigs
- Fix early boot code by adding missing intersection check to prevent
potential overwriting of the ipl report
- Fix a use-after-free issue in s390-specific code related to PCI
resources being retained after hot-unplugging individual functions,
by removing the resources from the PCI bus's resource list and using
the zpci_bar_struct's resource pointer directly
* tag 's390-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: update defconfigs
PCI: s390: Fix use-after-free of PCI resources with per-function hotplug
s390/ipl: add missing intersection check to ipl_report handling
Linus Torvalds [Fri, 17 Mar 2023 17:01:07 +0000 (10:01 -0700)]
Merge tag 'powerpc-6.3-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix false detection of read faults, introduced by execute-only
support
- Fix a build failure when GENERIC_ALLOCATOR is not selected
Thanks to Russell Currey, Randy Dunlap, Michal Suchánek, Nathan Lynch,
and Benjamin Gray.
* tag 'powerpc-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fix false detection of read faults
powerpc/pseries: RTAS work area requires GENERIC_ALLOCATOR
Linus Torvalds [Fri, 17 Mar 2023 16:49:17 +0000 (09:49 -0700)]
Merge tag 'mmc-v6.3-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- dw_mmc-starfive: Fix initialization of the prev_err variable
- sdhci_am654: Lower power-on failed message severity
* tag 'mmc-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: dw_mmc-starfive: Fix initialization of prev_err
mmc: sdhci_am654: lower power-on failed message severity
Linus Torvalds [Fri, 17 Mar 2023 16:43:10 +0000 (09:43 -0700)]
Merge tag 'sound-6.3-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Nothing surprising, a collection of small device-specific fixes.
The majority of changes are for ASoC Intel stuff, while a few other
ASoC and HD-audio fixes are found"
* tag 'sound-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (31 commits)
ALSA: hda/ca0132: fixup buffer overrun at tuning_ctl_set()
ALSA: asihpi: check pao in control_message()
ASoC: hdmi-codec: only startup/shutdown on supported streams
ASoC: da7219: Initialize jack_det_mutex
ALSA: hda: Match only Intel devices with CONTROLLER_IN_GPU()
ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro
ALSA: hda/realtek: fix speaker, mute/micmute LEDs not work on a HP platform
ALSA: hda: intel-dsp-config: add MTL PCI id
ASoC: SOF: IPC4: update gain ipc msg definition to align with fw
ASoC: SOF: sof-audio: don't squelch errors in WIDGET_SETUP phase
ASoC: SOF: Intel: hda-ctrl: re-add sleep after entering and exiting reset
ASoC: SOF: Intel: hda-dsp: harden D0i3 programming sequence
ASoC: SOF: ipc4-topology: set dmic dai index from copier
ASoC: SOF: sof-audio: Fix broken early bclk feature for SSP
ASoC: SOF: Intel: pci-tng: revert invalid bar size setting
ASoC: SOF: topology: Fix error handling in sof_widget_ready()
ASoC: Intel: soc-acpi: fix copy-paste issue in topology names
ASoC: SOF: ipc4-topology: Fix incorrect sample rate print unit
ASoC: SOF: ipc3: Check for upper size limit for the received message
ASOC: SOF: Intel: pci-tgl: Fix device description
...
Linus Torvalds [Fri, 17 Mar 2023 16:35:40 +0000 (09:35 -0700)]
Merge tag 'drm-fixes-2023-03-17' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Seems like a pretty regular rc3, i915 and amdgpu with the usual
selection of fixes, then a scattering of fixes across misc drivers and
other areas:
accel:
- build fix for accel
edid:
- fix info leak in edid
ttm:
- fix NULL ptr deref
- reference counting fix
i915:
- Fix hwmon PL1 power limit enabling
- Fix audio ELD handling for DP MST
- Fix PSR io and wake line calculations
- Fix DG2 HDMI modes with 267.30 and 319.89 MHz pixel clocks
- Fix SSEU subslice out-of-bounds access
- Fix misuse of non-idle barriers as fence trackers
amdgpu:
- SMU 13 update
- RDNA2 suspend/resume fix when overclocking is enabled
- SRIOV VCN fixes
- HDCP suspend/resume fix
- Fix drm polling splat regression
- Fix dirty rectangle tracking for PSR
- Fix vangogh regression on certain BIOSes
- Misc display fixes
- Suspend/resume IOMMU regression fix
amdkfd:
- Fix BO offset for multi-VMA page migration
- Fix a possible double free
- Fix potential use after free
- Fix process cleanup on module exit
bridge:
- fix returned array size name documentation
fbdev:
- ref-counting fix for fbdev deferred I/O
virtio:
- dma sync fix
shmem-helper:
- error path fix
msm:
- shrinker blocking fix
panfrost:
- shrinker rpm fix
chipsfb:
- fix error code
meson:
- fix 1px pink line
- fix regulator interaction
sun4i:
- fix missing component unbind"
* tag 'drm-fixes-2023-03-17' of git://anongit.freedesktop.org/drm/drm: (38 commits)
drm/ttm: drop extra ttm_bo_put in ttm_bo_cleanup_refs
drm/amdgpu: Don't resume IOMMU after incomplete init
drm/amdkfd: Fixed kfd_process cleanup on module exit.
drm/amd/display: disconnect MPCC only on OTG change
drm/amd/display: Fix DP MST sinks removal issue
drm/amd/display: Do not set DRR on pipe Commit
drm/amd/display: Remove OTG DIV register write for Virtual signals.
drm/meson: dw-hdmi: Fix devm_regulator_*get_enable*() conversion again
drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdoc
drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
drm/amdgpu/nv: fix codec array for SR_IOV
drm/amd/display: Write to correct dirty_rect
drm/amdgpu: move poll enabled/disable into non DC path
drm/amd/display: Fix HDCP failing to enable after suspend
drm/amdkfd: fix potential kgd_mem UAFs
drm/amdgpu/vcn: custom video info caps for sriov
drm/amd/pm: Fix sienna cichlid incorrect OD volage after resume
drm/amd/pm: bump SMU 13.0.4 driver_if header version
drm/amdkfd: fix a potential double free in pqm_create_queue
drm/amdkfd: Get prange->offset after svm_range_vram_node_new
...
Linus Torvalds [Fri, 17 Mar 2023 16:30:57 +0000 (09:30 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Ten patches, eight in drivers and two in the core, which correct a
regression from directory removal and add a no VPD size quirk also to
fix a regression. All pretty small"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: mcq: Use active_reqs to check busy in clock scaling
scsi: core: Fix a procfs host directory removal regression
scsi: core: Add BLIST_NO_VPD_SIZE for some VDASD
scsi: mpi3mr: Fix expander node leak in mpi3mr_remove()
scsi: mpi3mr: Fix memory leaks in mpi3mr_init_ioc()
scsi: mpi3mr: Fix sas_hba.phy memory leak in mpi3mr_remove()
scsi: mpi3mr: Fix mpi3mr_hba_port memory leak in mpi3mr_remove()
scsi: mpi3mr: Fix config page DMA memory leak
scsi: mpi3mr: Fix throttle_groups memory leak
scsi: mpt3sas: Fix NULL pointer access in mpt3sas_transport_port_add()
Rafael J. Wysocki [Fri, 17 Mar 2023 15:55:01 +0000 (16:55 +0100)]
Merge branch 'pm-cpuidle'
Merge a PSCI cpuidle driver fix for 6.3-rc1:
- Fix the psci_pd_init_topology() failure path in the PSCI cpuidle
driver (Shawn Guo).
* pm-cpuidle:
cpuidle: psci: Iterate backwards over list in psci_pd_remove()
Rafael J. Wysocki [Fri, 17 Mar 2023 15:44:41 +0000 (16:44 +0100)]
Merge branches 'acpi-video', 'acpi-x86', 'acpi-tools' and 'acpi-docs'
Merge a new ACPI backlight quirk, new ACPI quirks for I2C device
enumeration on some platforms, a pfrut utility fix and an ACPI
documentation fix for 6.3-rc3:
- Add backlight=native DMI quirk for Dell Vostro 15 3535 to the ACPI
video driver (Chia-Lin Kao).
- Add ACPI quirks for I2C devices enumeration on Lenovo Yoga Book X90
and Acer Iconia One 7 B1-750 (Hans de Goede).
- Fix handling of invalid command line option values in the ACPI pfrut
utility (Chen Yu).
- Fix references to I2C device data type in the ACPI documentation for
device enumeration (Andy Shevchenko).
* acpi-video:
ACPI: video: Add backlight=native DMI quirk for Dell Vostro 15 3535
* acpi-x86:
ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Book X90
ACPI: x86: Add skip i2c clients quirk for Acer Iconia One 7 B1-750
ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helper
* acpi-tools:
ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range
* acpi-docs:
ACPI: docs: enumeration: Correct reference to the I²C device data type
Len Brown [Fri, 17 Mar 2023 15:34:10 +0000 (11:34 -0400)]
tools/power turbostat: version 2023.03.17
Happy St. Patrick's Day!
Signed-off-by: Len Brown <len.brown@intel.com>
Antti Laakso [Wed, 25 Jan 2023 13:17:50 +0000 (15:17 +0200)]
tools/power turbostat: fix decoding of HWP_STATUS
The "excursion to minimum" information is in bit2
in HWP_STATUS MSR. Fix the bitmask used for
decoding the register.
Signed-off-by: Antti Laakso <antti.laakso@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Wed, 4 Jan 2023 14:23:53 +0000 (22:23 +0800)]
tools/power turbostat: Introduce support for EMR
Introduce support for EMR.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 17 Mar 2023 15:25:56 +0000 (11:25 -0400)]
tools/power turbostat: remove stray newlines from warn/warnx strings
warn(3) terminates strings with newlines
Signed-off-by: Len Brown <len.brown@intel.com>
Prarit Bhargava [Thu, 15 Dec 2022 15:18:16 +0000 (10:18 -0500)]
tools/power turbostat: Fix /dev/cpu_dma_latency warnings
When running as non-root the following error is seen in turbostat:
turbostat: fopen /dev/cpu_dma_latency
: Permission denied
turbostat and the man page have information on how to avoid other
permission errors, so these can be fixed the same way.
Provide better /dev/cpu_dma_latency warnings that provide instructions on
how to avoid the error, and update the man page.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Prarit Bhargava [Tue, 18 Oct 2022 19:23:37 +0000 (15:23 -0400)]
tools/power turbostat: Provide better debug messages for failed capabilities accesses
turbostat reports some capabilities access errors and not others. Provide
the same debug message for all errors.
[lenb: remove extra quotes]
Cc: David Arcari <darcari@redhat.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Thu, 13 Oct 2022 10:42:29 +0000 (12:42 +0200)]
tools/power turbostat: update dump of SECONDARY_TURBO_RATIO_LIMIT
cosmetic only (but useful if you copy/paste)
Signed-off-by: Len Brown <len.brown@intel.com>
David S. Miller [Fri, 17 Mar 2023 07:56:41 +0000 (07:56 +0000)]
Merge branch 'bonding-fixes'
Nikolay Aleksandrov says:
====================
bonding: properly restore flags when bond changes ether type
A bug was reported by syzbot[1] that causes a warning and a myriad of
other potential issues if a bond, that is also a slave, fails to enslave a
non-eth device. While fixing that bug I found that we have the same
issues when such enslave passes and after that the bond changes back to
ARPHRD_ETHER (again due to ether_setup). This set fixes all issues by
extracting the ether_setup() sequence in a helper which does the right
thing about bond flags when it needs to change back to ARPHRD_ETHER. It
also adds selftests for these cases.
Patch 01 adds the new bond_ether_setup helper and fixes the issues when a
bond device changes its ether type due to successful enslave. Patch 02
fixes the issues when it changes its ether type due to an unsuccessful
enslave. Note we need two patches because the bugs were introduced by
different commits. Patch 03 adds the new selftests.
Due to the comment adjustment and squash, could you please review
patch 01 again? I've kept the other acks since there were no code
changes.
v3: squash the helper patch and the first fix, adjust the comment above
it to be explicit about the bond device, no code changes
v2: new set, all patches are new due to new approach of fixing these bugs
[1] https://syzkaller.appspot.com/bug?id=
391c7b1f6522182899efba27d891f1743e8eb3ef
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Wed, 15 Mar 2023 11:18:42 +0000 (13:18 +0200)]
selftests: bonding: add tests for ether type changes
Add new network selftests for the bonding device which exercise the ether
type changing call paths. They also test for the recent syzbot bug[1] which
causes a warning and results in wrong device flags (IFF_SLAVE missing).
The test adds three bond devices and a nlmon device, enslaves one of the
bond devices to the other and then uses the nlmon device for successful
and unsuccesful enslaves both of which change the bond ether type. Thus
we can test for both MASTER and SLAVE flags at the same time.
If the flags are properly restored we get:
TEST: Change ether type of an enslaved bond device with unsuccessful enslave [ OK ]
TEST: Change ether type of an enslaved bond device with successful enslave [ OK ]
[1] https://syzkaller.appspot.com/bug?id=
391c7b1f6522182899efba27d891f1743e8eb3ef
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Wed, 15 Mar 2023 11:18:41 +0000 (13:18 +0200)]
bonding: restore bond's IFF_SLAVE flag if a non-eth dev enslave fails
syzbot reported a warning[1] where the bond device itself is a slave and
we try to enslave a non-ethernet device as the first slave which fails
but then in the error path when ether_setup() restores the bond device
it also clears all flags. In my previous fix[2] I restored the
IFF_MASTER flag, but I didn't consider the case that the bond device
itself might also be a slave with IFF_SLAVE set, so we need to restore
that flag as well. Use the bond_ether_setup helper which does the right
thing and restores the bond's flags properly.
Steps to reproduce using a nlmon dev:
$ ip l add nlmon0 type nlmon
$ ip l add bond1 type bond
$ ip l add bond2 type bond
$ ip l set bond1 master bond2
$ ip l set dev nlmon0 master bond1
$ ip -d l sh dev bond1
22: bond1: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noqueue master bond2 state DOWN mode DEFAULT group default qlen 1000
(now bond1's IFF_SLAVE flag is gone and we'll hit a warning[3] if we
try to delete it)
[1] https://syzkaller.appspot.com/bug?id=
391c7b1f6522182899efba27d891f1743e8eb3ef
[2] commit
7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure")
[3] example warning:
[ 27.008664] bond1: (slave nlmon0): The slave device specified does not support setting the MAC address
[ 27.008692] bond1: (slave nlmon0): Error -95 calling set_mac_address
[ 32.464639] bond1 (unregistering): Released all slaves
[ 32.464685] ------------[ cut here ]------------
[ 32.464686] WARNING: CPU: 1 PID: 2004 at net/core/dev.c:10829 unregister_netdevice_many+0x72a/0x780
[ 32.464694] Modules linked in: br_netfilter bridge bonding virtio_net
[ 32.464699] CPU: 1 PID: 2004 Comm: ip Kdump: loaded Not tainted 5.18.0-rc3+ #47
[ 32.464703] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
[ 32.464704] RIP: 0010:unregister_netdevice_many+0x72a/0x780
[ 32.464707] Code: 99 fd ff ff ba 90 1a 00 00 48 c7 c6 f4 02 66 96 48 c7 c7 20 4d 35 96 c6 05 fa c7 2b 02 01 e8 be 6f 4a 00 0f 0b e9 73 fd ff ff <0f> 0b e9 5f fd ff ff 80 3d e3 c7 2b 02 00 0f 85 3b fd ff ff ba 59
[ 32.464710] RSP: 0018:
ffffa006422d7820 EFLAGS:
00010206
[ 32.464712] RAX:
ffff8f6e077140a0 RBX:
ffffa006422d7888 RCX:
0000000000000000
[ 32.464714] RDX:
ffff8f6e12edbe58 RSI:
0000000000000296 RDI:
ffffffff96d4a520
[ 32.464716] RBP:
ffff8f6e07714000 R08:
ffffffff96d63600 R09:
ffffa006422d7728
[ 32.464717] R10:
0000000000000ec0 R11:
ffffffff9698c988 R12:
ffff8f6e12edb140
[ 32.464719] R13:
dead000000000122 R14:
dead000000000100 R15:
ffff8f6e12edb140
[ 32.464723] FS:
00007f297c2f1740(0000) GS:
ffff8f6e5d900000(0000) knlGS:
0000000000000000
[ 32.464725] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 32.464726] CR2:
00007f297bf1c800 CR3:
00000000115e8000 CR4:
0000000000350ee0
[ 32.464730] Call Trace:
[ 32.464763] <TASK>
[ 32.464767] rtnl_dellink+0x13e/0x380
[ 32.464776] ? cred_has_capability.isra.0+0x68/0x100
[ 32.464780] ? __rtnl_unlock+0x33/0x60
[ 32.464783] ? bpf_lsm_capset+0x10/0x10
[ 32.464786] ? security_capable+0x36/0x50
[ 32.464790] rtnetlink_rcv_msg+0x14e/0x3b0
[ 32.464792] ? _copy_to_iter+0xb1/0x790
[ 32.464796] ? post_alloc_hook+0xa0/0x160
[ 32.464799] ? rtnl_calcit.isra.0+0x110/0x110
[ 32.464802] netlink_rcv_skb+0x50/0xf0
[ 32.464806] netlink_unicast+0x216/0x340
[ 32.464809] netlink_sendmsg+0x23f/0x480
[ 32.464812] sock_sendmsg+0x5e/0x60
[ 32.464815] ____sys_sendmsg+0x22c/0x270
[ 32.464818] ? import_iovec+0x17/0x20
[ 32.464821] ? sendmsg_copy_msghdr+0x59/0x90
[ 32.464823] ? do_set_pte+0xa0/0xe0
[ 32.464828] ___sys_sendmsg+0x81/0xc0
[ 32.464832] ? mod_objcg_state+0xc6/0x300
[ 32.464835] ? refill_obj_stock+0xa9/0x160
[ 32.464838] ? memcg_slab_free_hook+0x1a5/0x1f0
[ 32.464842] __sys_sendmsg+0x49/0x80
[ 32.464847] do_syscall_64+0x3b/0x90
[ 32.464851] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 32.464865] RIP: 0033:0x7f297bf2e5e7
[ 32.464868] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 32.464869] RSP: 002b:
00007ffd96c824c8 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
[ 32.464872] RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f297bf2e5e7
[ 32.464874] RDX:
0000000000000000 RSI:
00007ffd96c82540 RDI:
0000000000000003
[ 32.464875] RBP:
00000000640f19de R08:
0000000000000001 R09:
000000000000007c
[ 32.464876] R10:
00007f297bffabe0 R11:
0000000000000246 R12:
0000000000000001
[ 32.464877] R13:
00007ffd96c82d20 R14:
00007ffd96c82610 R15:
000055bfe38a7020
[ 32.464881] </TASK>
[ 32.464882] ---[ end trace
0000000000000000 ]---
Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure")
Reported-by: syzbot+9dfc3f3348729cc82277@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Wed, 15 Mar 2023 11:18:40 +0000 (13:18 +0200)]
bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type change
Add bond_ether_setup helper which is used to fix ether_setup() calls in the
bonding driver. It takes care of both IFF_MASTER and IFF_SLAVE flags, the
former is always restored and the latter only if it was set.
If the bond enslaves non-ARPHRD_ETHER device (changes its type), then
releases it and enslaves ARPHRD_ETHER device (changes back) then we
use ether_setup() to restore the bond device type but it also resets its
flags and removes IFF_MASTER and IFF_SLAVE[1]. Use the bond_ether_setup
helper to restore both after such transition.
[1] reproduce (nlmon is non-ARPHRD_ETHER):
$ ip l add nlmon0 type nlmon
$ ip l add bond2 type bond mode active-backup
$ ip l set nlmon0 master bond2
$ ip l set nlmon0 nomaster
$ ip l add bond1 type bond
(we use bond1 as ARPHRD_ETHER device to restore bond2's mode)
$ ip l set bond1 master bond2
$ ip l sh dev bond2
37: bond2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether be:d7:c5:40:5b:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
(notice bond2's IFF_MASTER is missing)
Fixes: e36b9d16c6a6 ("bonding: clean muticast addresses when device changes type")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 17 Mar 2023 07:50:51 +0000 (07:50 +0000)]
Merge branch 'net-renesas-rswitch-fixes'
Yoshihiro Shimoda says:
====================
net: renesas: rswitch: Fix rx and timestamp
I got reports locally about issues on the rswitch driver.
So, fix the issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshihiro Shimoda [Wed, 15 Mar 2023 07:04:24 +0000 (16:04 +0900)]
net: renesas: rswitch: Fix GWTSDIE register handling
Since the GWCA has the TX timestamp feature, this driver
should not disable it if one of ports is opened. So, fix it.
Reported-by: Phong Hoang <phong.hoang.wz@renesas.com>
Fixes: 33f5d733b589 ("net: renesas: rswitch: Improve TX timestamp accuracy")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshihiro Shimoda [Wed, 15 Mar 2023 07:04:23 +0000 (16:04 +0900)]
net: renesas: rswitch: Fix the output value of quote from rswitch_rx()
If the RX descriptor doesn't have any data, the output value of quote
from rswitch_rx() will be increased unexpectedily. So, fix it.
Reported-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Liang He [Wed, 15 Mar 2023 06:00:21 +0000 (14:00 +0800)]
ethernet: sun: add check for the mdesc_grab()
In vnet_port_probe() and vsw_port_probe(), we should
check the return value of mdesc_grab() as it may
return NULL which can caused NPD bugs.
Fixes: 5d01fa0c6bd8 ("ldmvsw: Add ldmvsw.c driver code")
Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.")
Signed-off-by: Liang He <windhl@126.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>