Stanislav Fomichev [Thu, 11 May 2023 17:04:54 +0000 (10:04 -0700)]
selftests/bpf: Update EFAULT {g,s}etsockopt selftests
Instead of assuming EFAULT, let's assume the BPF program's
output is ignored.
Remove "getsockopt: deny arbitrary ctx->retval" because it
was actually testing optlen. We have separate set of tests
for retval.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230511170456.1759459-3-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Stanislav Fomichev [Thu, 11 May 2023 17:04:53 +0000 (10:04 -0700)]
bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen
With the way the hooks implemented right now, we have a special
condition: optval larger than PAGE_SIZE will expose only first 4k into
BPF; any modifications to the optval are ignored. If the BPF program
doesn't handle this condition by resetting optlen to 0,
the userspace will get EFAULT.
The intention of the EFAULT was to make it apparent to the
developers that the program is doing something wrong.
However, this inadvertently might affect production workloads
with the BPF programs that are not too careful (i.e., returning EFAULT
for perfectly valid setsockopt/getsockopt calls).
Let's try to minimize the chance of BPF program screwing up userspace
by ignoring the output of those BPF programs (instead of returning
EFAULT to the userspace). pr_info_once those cases to
the dmesg to help with figuring out what's going wrong.
Fixes:
0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230511170456.1759459-2-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Andrii Nakryiko [Tue, 9 May 2023 06:55:02 +0000 (23:55 -0700)]
libbpf: fix offsetof() and container_of() to work with CO-RE
It seems like __builtin_offset() doesn't preserve CO-RE field
relocations properly. So if offsetof() macro is defined through
__builtin_offset(), CO-RE-enabled BPF code using container_of() will be
subtly and silently broken.
To avoid this problem, redefine offsetof() and container_of() in the
form that works with CO-RE relocations more reliably.
Fixes:
5fbc220862fc ("tools/libpf: Add offsetof/container_of macro in bpf_helpers.h")
Reported-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230509065502.2306180-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Martin KaFai Lau [Thu, 11 May 2023 04:37:48 +0000 (21:37 -0700)]
bpf: Address KCSAN report on bpf_lru_list
KCSAN reported a data-race when accessing node->ref.
Although node->ref does not have to be accurate,
take this chance to use a more common READ_ONCE() and WRITE_ONCE()
pattern instead of data_race().
There is an existing bpf_lru_node_is_ref() and bpf_lru_node_set_ref().
This patch also adds bpf_lru_node_clear_ref() to do the
WRITE_ONCE(node->ref, 0) also.
==================================================================
BUG: KCSAN: data-race in __bpf_lru_list_rotate / __htab_lru_percpu_map_update_elem
write to 0xffff888137038deb of 1 bytes by task 11240 on cpu 1:
__bpf_lru_node_move kernel/bpf/bpf_lru_list.c:113 [inline]
__bpf_lru_list_rotate_active kernel/bpf/bpf_lru_list.c:149 [inline]
__bpf_lru_list_rotate+0x1bf/0x750 kernel/bpf/bpf_lru_list.c:240
bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:329 [inline]
bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline]
bpf_lru_pop_free+0x638/0xe20 kernel/bpf/bpf_lru_list.c:499
prealloc_lru_pop kernel/bpf/hashtab.c:290 [inline]
__htab_lru_percpu_map_update_elem+0xe7/0x820 kernel/bpf/hashtab.c:1316
bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
__sys_bpf+0x338/0x810
__do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888137038deb of 1 bytes by task 11241 on cpu 0:
bpf_lru_node_set_ref kernel/bpf/bpf_lru_list.h:70 [inline]
__htab_lru_percpu_map_update_elem+0x2f1/0x820 kernel/bpf/hashtab.c:1332
bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
__sys_bpf+0x338/0x810
__do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x01 -> 0x00
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11241 Comm: syz-executor.3 Not tainted 6.3.0-rc7-syzkaller-00136-g6a66fdd29ea1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
==================================================================
Reported-by: syzbot+ebe648a84e8784763f82@syzkaller.appspotmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230511043748.1384166-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alan Maguire [Wed, 10 May 2023 13:02:41 +0000 (14:02 +0100)]
bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25
v1.25 of pahole supports filtering out functions with multiple inconsistent
function prototypes or optimized-out parameters from the BTF representation.
These present problems because there is no additional info in BTF saying which
inconsistent prototype matches which function instance to help guide attachment,
and functions with optimized-out parameters can lead to incorrect assumptions
about register contents.
So for now, filter out such functions while adding BTF representations for
functions that have "."-suffixes (foo.isra.0) but not optimized-out parameters.
This patch assumes that below linked changes land in pahole for v1.25.
Issues with pahole filtering being too aggressive in removing functions
appear to be resolved now, but CI and further testing will confirm.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230510130241.1696561-1-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Sat, 6 May 2023 23:42:58 +0000 (16:42 -0700)]
Merge branch 'Dynptr Verifier Adjustments'
Daniel Rosenberg says:
====================
These patches relax a few verifier requirements around dynptrs.
Patches 1-3 are unchanged from v2, apart from rebasing
Patch 4 is the same as in v1, see
https://lore.kernel.org/bpf/CA+PiJmST4WUH061KaxJ4kRL=fqy3X6+Wgb2E2rrLT5OYjUzxfQ@mail.gmail.com/
Patch 5 adds a test for the change in Patch 4
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Rosenberg [Sat, 6 May 2023 01:31:34 +0000 (18:31 -0700)]
selftests/bpf: Accept mem from dynptr in helper funcs
This ensures that buffers retrieved from dynptr_data are allowed to be
passed in to helpers that take mem, like bpf_strncmp
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20230506013134.2492210-6-drosen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Rosenberg [Sat, 6 May 2023 01:31:33 +0000 (18:31 -0700)]
bpf: verifier: Accept dynptr mem as mem in helpers
This allows using memory retrieved from dynptrs with helper functions
that accept ARG_PTR_TO_MEM. For instance, results from bpf_dynptr_data
can be passed along to bpf_strncmp.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20230506013134.2492210-5-drosen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Rosenberg [Sat, 6 May 2023 01:31:32 +0000 (18:31 -0700)]
selftests/bpf: Check overflow in optional buffer
This ensures we still reject invalid memory accesses in buffers that are
marked optional.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20230506013134.2492210-4-drosen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Rosenberg [Sat, 6 May 2023 01:31:31 +0000 (18:31 -0700)]
selftests/bpf: Test allowing NULL buffer in dynptr slice
bpf_dynptr_slice(_rw) no longer requires a buffer for verification. If the
buffer is needed, but not present, the function will return NULL.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20230506013134.2492210-3-drosen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Rosenberg [Sat, 6 May 2023 01:31:30 +0000 (18:31 -0700)]
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
bpf_dynptr_slice(_rw) uses a user provided buffer if it can not provide
a pointer to a block of contiguous memory. This buffer is unused in the
case of local dynptrs, and may be unused in other cases as well. There
is no need to require the buffer, as the kfunc can just return NULL if
it was needed and not provided.
This adds another kfunc annotation, __opt, which combines with __sz and
__szk to allow the buffer associated with the size to be NULL. If the
buffer is NULL, the verifier does not check that the buffer is of
sufficient size.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20230506013134.2492210-2-drosen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Sat, 6 May 2023 20:56:38 +0000 (13:56 -0700)]
Merge branch 'Introduce a new kfunc of bpf_task_under_cgroup'
Feng zhou says:
====================
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Feng Zhou [Sat, 6 May 2023 03:15:45 +0000 (11:15 +0800)]
selftests/bpf: Add testcase for bpf_task_under_cgroup
test_progs:
Tests new kfunc bpf_task_under_cgroup().
The bpf program saves the new task's pid within a given cgroup to
the remote_pid, which is convenient for the user-mode program to
verify the test correctness.
The user-mode program creates its own mount namespace, and mounts the
cgroupsv2 hierarchy in there, call the fork syscall, then check if
remote_pid and local_pid are unequal.
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230506031545.35991-3-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Feng Zhou [Sat, 6 May 2023 03:15:44 +0000 (11:15 +0800)]
bpf: Add bpf_task_under_cgroup() kfunc
Add a kfunc that's similar to the bpf_current_task_under_cgroup.
The difference is that it is a designated task.
When hook sched related functions, sometimes it is necessary to
specify a task instead of the current task.
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230506031545.35991-2-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pengcheng Yang [Fri, 5 May 2023 08:50:58 +0000 (16:50 +0800)]
samples/bpf: Fix buffer overflow in tcp_basertt
Using sizeof(nv) or strlen(nv)+1 is correct.
Fixes:
c890063e4404 ("bpf: sample BPF_SOCKET_OPS_BASE_RTT program")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Link: https://lore.kernel.org/r/1683276658-2860-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Will Hawkins [Fri, 28 Apr 2023 02:30:15 +0000 (22:30 -0400)]
bpf, docs: Update llvm_relocs.rst with typo fixes
Correct a few typographical errors and fix some mistakes in examples.
Signed-off-by: Will Hawkins <hawkinsw@obs.cr>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230428023015.1698072-2-hawkinsw@obs.cr
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Fri, 5 May 2023 05:35:36 +0000 (22:35 -0700)]
Merge branch 'Add precision propagation for subprogs and callbacks'
Andrii Nakryiko says:
====================
As more and more real-world BPF programs become more complex
and increasingly use subprograms (both static and global), scalar precision
tracking and its (previously weak) support for BPF subprograms (and callbacks
as a special case of that) is becoming more and more of an issue and
limitation. Couple that with increasing reliance on state equivalence (BPF
open-coded iterators have a hard requirement for state equivalence to converge
and successfully validate loops), and it becomes pretty critical to address
this limitation and make precision tracking universally supported for BPF
programs of any complexity and composition.
This patch set teaches BPF verifier to support SCALAR precision
backpropagation across multiple frames (for subprogram calls and callback
simulations) and addresses most practical situations (SCALAR stack
loads/stores using registers other than r10 being the last remaining
limitation, though thankfully rarely used in practice).
Main logic is explained in details in patch #8. The rest are preliminary
preparations, refactorings, clean ups, and fixes. See respective patches for
details.
Patch #8 has also veristat comparison of results for selftests, Cilium, and
some of Meta production BPF programs before and after these changes.
v2->v3:
- drop bitcnt and ifs from bt_xxx() helpers (Alexei);
v1->v2:
- addressed review feedback form Alexei, adjusted commit messages, comments,
added verbose(), WARN_ONCE(), etc;
- re-ran all the tests and veristat on selftests, cilium, and meta-internal
code: no new changes and no kernel warnings.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 5 May 2023 04:33:17 +0000 (21:33 -0700)]
selftests/bpf: revert iter test subprog precision workaround
Now that precision propagation is supported fully in the presence of
subprogs, there is no need to work around iter test. Revert original
workaround.
This reverts
be7dbd275dc6 ("selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests").
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-11-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 5 May 2023 04:33:16 +0000 (21:33 -0700)]
selftests/bpf: add precision propagation tests in the presence of subprogs
Add a bunch of tests validating verifier's precision backpropagation
logic in the presence of subprog calls and/or callback-calling
helpers/kfuncs.
We validate the following conditions:
- subprog_result_precise: static subprog r0 result precision handling;
- global_subprog_result_precise: global subprog r0 precision
shortcutting, similar to BPF helper handling;
- callback_result_precise: similarly r0 marking precise for
callback-calling helpers;
- parent_callee_saved_reg_precise, parent_callee_saved_reg_precise_global:
propagation of precision for callee-saved registers bypassing
static/global subprogs;
- parent_callee_saved_reg_precise_with_callback: same as above, but in
the presence of callback-calling helper;
- parent_stack_slot_precise, parent_stack_slot_precise_global:
similar to above, but instead propagating precision of stack slot
(spilled SCALAR reg);
- parent_stack_slot_precise_with_callback: same as above, but in the
presence of callback-calling helper;
- subprog_arg_precise: propagation of precision of static subprog's
input argument back to caller;
- subprog_spill_into_parent_stack_slot_precise: negative test
validating that verifier currently can't support backtracking of stack
access with non-r10 register, we validate that we fallback to
forcing precision for all SCALARs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 5 May 2023 04:33:15 +0000 (21:33 -0700)]
bpf: support precision propagation in the presence of subprogs
Add support precision backtracking in the presence of subprogram frames in
jump history.
This means supporting a few different kinds of subprogram invocation
situations, all requiring a slightly different handling in precision
backtracking handling logic:
- static subprogram calls;
- global subprogram calls;
- callback-calling helpers/kfuncs.
For each of those we need to handle a few precision propagation cases:
- what to do with precision of subprog returns (r0);
- what to do with precision of input arguments;
- for all of them callee-saved registers in caller function should be
propagated ignoring subprog/callback part of jump history.
N.B. Async callback-calling helpers (currently only
bpf_timer_set_callback()) are transparent to all this because they set
a separate async callback environment and thus callback's history is not
shared with main program's history. So as far as all the changes in this
commit goes, such helper is just a regular helper.
Let's look at all these situation in more details. Let's start with
static subprogram being called, using an exxerpt of a simple main
program and its static subprog, indenting subprog's frame slightly to
make everything clear.
frame 0 frame 1 precision set
======= ======= =============
9: r6 = 456;
10: r1 = 123; fr0: r6
11: call pc+10; fr0: r1, r6
22: r0 = r1; fr0: r6; fr1: r1
23: exit fr0: r6; fr1: r0
12: r1 = <map_pointer> fr0: r0, r6
13: r1 += r0; fr0: r0, r6
14: r1 += r6; fr0: r6
15: exit
As can be seen above main function is passing 123 as single argument to
an identity (`return x;`) subprog. Returned value is used to adjust map
pointer offset, which forces r0 to be marked as precise. Then
instruction #14 does the same for callee-saved r6, which will have to be
backtracked all the way to instruction #9. For brevity, precision sets
for instruction #13 and #14 are combined in the diagram above.
First, for subprog calls, r0 returned from subprog (in frame 0) has to
go into subprog's frame 1, and should be cleared from frame 0. So we go
back into subprog's frame knowing we need to mark r0 precise. We then
see that insn #22 sets r0 from r1, so now we care about marking r1
precise. When we pop up from subprog's frame back into caller at
insn #11 we keep r1, as it's an argument-passing register, so we eventually
find `10: r1 = 123;` and satify precision propagation chain for insn #13.
This example demonstrates two sets of rules:
- r0 returned after subprog call has to be moved into subprog's r0 set;
- *static* subprog arguments (r1-r5) are moved back to caller precision set.
Let's look at what happens with callee-saved precision propagation. Insn #14
mark r6 as precise. When we get into subprog's frame, we keep r6 in
frame 0's precision set *only*. Subprog itself has its own set of
independent r6-r10 registers and is not affected. When we eventually
made our way out of subprog frame we keep r6 in precision set until we
reach `9: r6 = 456;`, satisfying propagation. r6-r10 propagation is
perhaps the simplest aspect, it always stays in its original frame.
That's pretty much all we have to do to support precision propagation
across *static subprog* invocation.
Let's look at what happens when we have global subprog invocation.
frame 0 frame 1 precision set
======= ======= =============
9: r6 = 456;
10: r1 = 123; fr0: r6
11: call pc+10; # global subprog fr0: r6
12: r1 = <map_pointer> fr0: r0, r6
13: r1 += r0; fr0: r0, r6
14: r1 += r6; fr0: r6;
15: exit
Starting from insn #13, r0 has to be precise. We backtrack all the way
to insn #11 (call pc+10) and see that subprog is global, so was already
validated in isolation. As opposed to static subprog, global subprog
always returns unknown scalar r0, so that satisfies precision
propagation and we drop r0 from precision set. We are done for insns #13.
Now for insn #14. r6 is in precision set, we backtrack to `call pc+10;`.
Here we need to recognize that this is effectively both exit and entry
to global subprog, which means we stay in caller's frame. So we carry on
with r6 still in precision set, until we satisfy it at insn #9. The only
hard part with global subprogs is just knowing when it's a global func.
Lastly, callback-calling helpers and kfuncs do simulate subprog calls,
so jump history will have subprog instructions in between caller
program's instructions, but the rules of propagating r0 and r1-r5
differ, because we don't actually directly call callback. We actually
call helper/kfunc, which at runtime will call subprog, so the only
difference between normal helper/kfunc handling is that we need to make
sure to skip callback simulatinog part of jump history.
Let's look at an example to make this clearer.
frame 0 frame 1 precision set
======= ======= =============
8: r6 = 456;
9: r1 = 123; fr0: r6
10: r2 = &callback; fr0: r6
11: call bpf_loop; fr0: r6
22: r0 = r1; fr0: r6 fr1:
23: exit fr0: r6 fr1:
12: r1 = <map_pointer> fr0: r0, r6
13: r1 += r0; fr0: r0, r6
14: r1 += r6; fr0: r6;
15: exit
Again, insn #13 forces r0 to be precise. As soon as we get to `23: exit`
we see that this isn't actually a static subprog call (it's `call
bpf_loop;` helper call instead). So we clear r0 from precision set.
For callee-saved register, there is no difference: it stays in frame 0's
precision set, we go through insn #22 and #23, ignoring them until we
get back to caller frame 0, eventually satisfying precision backtrack
logic at insn #8 (`r6 = 456;`).
Assuming callback needed to set r0 as precise at insn #23, we'd
backtrack to insn #22, switching from r0 to r1, and then at the point
when we pop back to frame 0 at insn #11, we'll clear r1-r5 from
precision set, as we don't really do a subprog call directly, so there
is no input argument precision propagation.
That's pretty much it. With these changes, it seems like the only still
unsupported situation for precision backpropagation is the case when
program is accessing stack through registers other than r10. This is
still left as unsupported (though rare) case for now.
As for results. For selftests, few positive changes for bigger programs,
cls_redirect in dynptr variant benefitting the most:
[vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results.csv ~/subprog-precise-after-results.csv -f @veristat.cfg -e file,prog,insns -f 'insns_diff!=0'
File Program Insns (A) Insns (B) Insns (DIFF)
---------------------------------------- ------------- --------- --------- ----------------
pyperf600_bpf_loop.bpf.linked1.o on_event 2060 2002 -58 (-2.82%)
test_cls_redirect_dynptr.bpf.linked1.o cls_redirect 15660 2914 -12746 (-81.39%)
test_cls_redirect_subprogs.bpf.linked1.o cls_redirect 61620 59088 -2532 (-4.11%)
xdp_synproxy_kern.bpf.linked1.o syncookie_tc 109980 86278 -23702 (-21.55%)
xdp_synproxy_kern.bpf.linked1.o syncookie_xdp 97716 85147 -12569 (-12.86%)
Cilium progress don't really regress. They don't use subprogs and are
mostly unaffected, but some other fixes and improvements could have
changed something. This doesn't appear to be the case:
[vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-cilium.csv ~/subprog-precise-after-results-cilium.csv -e file,prog,insns -f 'insns_diff!=0'
File Program Insns (A) Insns (B) Insns (DIFF)
------------- ------------------------------ --------- --------- ------------
bpf_host.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%)
bpf_lxc.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%)
bpf_overlay.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%)
bpf_xdp.o tail_handle_nat_fwd_ipv6 12475 12504 +29 (+0.23%)
bpf_xdp.o tail_nodeport_nat_ingress_ipv6 6363 6371 +8 (+0.13%)
Looking at (somewhat anonymized) Meta production programs, we see mostly
insignificant variation in number of instructions, with one program
(syar_bind6_protect6) benefitting the most at -17%.
[vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-fbcode.csv ~/subprog-precise-after-results-fbcode.csv -e prog,insns -f 'insns_diff!=0'
Program Insns (A) Insns (B) Insns (DIFF)
------------------------ --------- --------- ----------------
on_request_context_event 597 585 -12 (-2.01%)
read_async_py_stack 43789 43657 -132 (-0.30%)
read_sync_py_stack 35041 37599 +2558 (+7.30%)
rrm_usdt 946 940 -6 (-0.63%)
sysarmor_inet6_bind 28863 28249 -614 (-2.13%)
sysarmor_inet_bind 28845 28240 -605 (-2.10%)
syar_bind4_protect4 154145 147640 -6505 (-4.22%)
syar_bind6_protect6 165242 137088 -28154 (-17.04%)
syar_task_exit_setgid 21289 19720 -1569 (-7.37%)
syar_task_exit_setuid 21290 19721 -1569 (-7.37%)
do_uprobe 19967 19413 -554 (-2.77%)
tw_twfw_ingress 215877 204833 -11044 (-5.12%)
tw_twfw_tc_in 215877 204833 -11044 (-5.12%)
But checking duration (wall clock) differences, that is the actual time taken
by verifier to validate programs, we see a sometimes dramatic improvements, all
the way to about 16x improvements:
[vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-meta.csv ~/subprog-precise-after-results-meta.csv -e prog,duration -s duration_diff^ | head -n20
Program Duration (us) (A) Duration (us) (B) Duration (us) (DIFF)
---------------------------------------- ----------------- ----------------- --------------------
tw_twfw_ingress 4488374 272836 -4215538 (-93.92%)
tw_twfw_tc_in 4339111 268175 -4070936 (-93.82%)
tw_twfw_egress 3521816 270751 -3251065 (-92.31%)
tw_twfw_tc_eg 3472878 284294 -3188584 (-91.81%)
balancer_ingress 343119 291391 -51728 (-15.08%)
syar_bind6_protect6 78992 64782 -14210 (-17.99%)
ttls_tc_ingress 11739 8176 -3563 (-30.35%)
kprobe__security_inode_link 13864 11341 -2523 (-18.20%)
read_sync_py_stack 21927 19442 -2485 (-11.33%)
read_async_py_stack 30444 28136 -2308 (-7.58%)
syar_task_exit_setuid 10256 8440 -1816 (-17.71%)
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 5 May 2023 04:33:14 +0000 (21:33 -0700)]
bpf: fix mark_all_scalars_precise use in mark_chain_precision
When precision backtracking bails out due to some unsupported sequence
of instructions (e.g., stack access through register other than r10), we
need to mark all SCALAR registers as precise to be safe. Currently,
though, we mark SCALARs precise only starting from the state we detected
unsupported condition, which could be one of the parent states of the
actual current state. This will leave some registers potentially not
marked as precise, even though they should. So make sure we start
marking scalars as precise from current state (env->cur_state).
Further, we don't currently detect a situation when we end up with some
stack slots marked as needing precision, but we ran out of available
states to find the instructions that populate those stack slots. This is
akin the `i >= func->allocated_stack / BPF_REG_SIZE` check and should be
handled similarly by falling back to marking all SCALARs precise. Add
this check when we run out of states.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 5 May 2023 04:33:13 +0000 (21:33 -0700)]
bpf: fix propagate_precision() logic for inner frames
Fix propagate_precision() logic to perform propagation of all necessary
registers and stack slots across all active frames *in one batch step*.
Doing this for each register/slot in each individual frame is wasteful,
but the main problem is that backtracking of instruction in any frame
except the deepest one just doesn't work. This is due to backtracking
logic relying on jump history, and available jump history always starts
(or ends, depending how you view it) in current frame. So, if
prog A (frame #0) called subprog B (frame #1) and we need to propagate
precision of, say, register R6 (callee-saved) within frame #0, we
actually don't even know where jump history that corresponds to prog
A even starts. We'd need to skip subprog part of jump history first to
be able to do this.
Luckily, with struct backtrack_state and __mark_chain_precision()
handling bitmasks tracking/propagation across all active frames at the
same time (added in previous patch), propagate_precision() can be both
fixed and sped up by setting all the necessary bits across all frames
and then performing one __mark_chain_precision() pass. This makes it
unnecessary to skip subprog parts of jump history.
We also improve logging along the way, to clearly specify which
registers' and slots' precision markings are propagated within which
frame. Each frame will have dedicated line and all registers and stack
slots from that frame will be reported in format similar to precision
backtrack regs/stack logging. E.g.:
frame 1: propagating r1,r2,r3,fp-8,fp-16
frame 0: propagating r3,r9,fp-120
Fixes:
529409ea92d5 ("bpf: propagate precision across all frames, not just the last one")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 5 May 2023 04:33:12 +0000 (21:33 -0700)]
bpf: maintain bitmasks across all active frames in __mark_chain_precision
Teach __mark_chain_precision logic to maintain register/stack masks
across all active frames when going from child state to parent state.
Currently this should be mostly no-op, as precision backtracking usually
bails out when encountering subprog entry/exit.
It's not very apparent from the diff due to increased indentation, but
the logic remains the same, except everything is done on specific `fr`
frame index. Calls to bt_clear_reg() and bt_clear_slot() are replaced
with frame-specific bt_clear_frame_reg() and bt_clear_frame_slot(),
where frame index is passed explicitly, instead of using current frame
number.
We also adjust logging to emit affected frame number. And we also add
better logging of human-readable register and stack slot masks, similar
to previous patch.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 5 May 2023 04:33:11 +0000 (21:33 -0700)]
bpf: improve precision backtrack logging
Add helper to format register and stack masks in more human-readable
format. Adjust logging a bit during backtrack propagation and especially
during forcing precision fallback logic to make it clearer what's going
on (with log_level=2, of course), and also start reporting affected
frame depth. This is in preparation for having more than one active
frame later when precision propagation between subprog calls is added.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 5 May 2023 04:33:10 +0000 (21:33 -0700)]
bpf: encapsulate precision backtracking bookkeeping
Add struct backtrack_state and straightforward API around it to keep
track of register and stack masks used and maintained during precision
backtracking process. Having this logic separately allow to keep
high-level backtracking algorithm cleaner, but also it sets us up to
cleanly keep track of register and stack masks per frame, allowing (with
some further logic adjustments) to perform precision backpropagation
across multiple frames (i.e., subprog calls).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 5 May 2023 04:33:09 +0000 (21:33 -0700)]
bpf: mark relevant stack slots scratched for register read instructions
When handling instructions that read register slots, mark relevant stack
slots as scratched so that verifier log would contain those slots' states, in
addition to currently emitted registers with stack slot offsets.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 5 May 2023 04:33:08 +0000 (21:33 -0700)]
veristat: add -t flag for adding BPF_F_TEST_STATE_FREQ program flag
Sometimes during debugging it's important that BPF program is loaded
with BPF_F_TEST_STATE_FREQ flag set to force verifier to do frequent
state checkpointing. Teach veristat to do this when -t ("test state")
flag is specified.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kenjiro Nakayama [Thu, 4 May 2023 03:54:43 +0000 (12:54 +0900)]
libbpf: Fix comment about arc and riscv arch in bpf_tracing.h
To make comments about arc and riscv arch in bpf_tracing.h accurate,
this patch fixes the comment about arc and adds the comment for riscv.
Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230504035443.427927-1-nakayamakenjiro@gmail.com
Kui-Feng Lee [Tue, 2 May 2023 18:14:18 +0000 (11:14 -0700)]
bpf: Print a warning only if writing to unprivileged_bpf_disabled.
Only print the warning message if you are writing to
"/proc/sys/kernel/unprivileged_bpf_disabled".
The kernel may print an annoying warning when you read
"/proc/sys/kernel/unprivileged_bpf_disabled" saying
WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible
via Spectre v2 BHB attacks!
However, this message is only meaningful when the feature is
disabled or enabled.
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230502181418.308479-1-kuifeng@meta.com
Yonghong Song [Tue, 2 May 2023 18:05:43 +0000 (11:05 -0700)]
bpf: Emit struct bpf_tcp_sock type in vmlinux BTF
In one of our internal testing, we found a case where
- uapi struct bpf_tcp_sock is in vmlinux.h where vmlinux.h is not
generated from the testing kernel
- struct bpf_tcp_sock is not in vmlinux BTF
The above combination caused bpf load failure as the following
memory access
struct bpf_tcp_sock *tcp_sock = ...;
... tcp_sock->snd_cwnd ...
needs CORE relocation but the relocation cannot be resolved since
the kernel BTF does not have corresponding type.
Similar to other previous cases (nf_conn___init, tcp6_sock, mctcp_sock, etc.),
add the type to vmlinux BTF with BTF_EMIT_TYPE macro.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230502180543.1832140-1-yhs@fb.com
Andrii Nakryiko [Mon, 1 May 2023 22:30:02 +0000 (15:30 -0700)]
Merge branch 'selftests/bpf: test_progs can read test lists from file'
Stephen Veiss says:
====================
BPF selftests have ALLOWLIST and DENYLIST files, used to control which
tests are run in CI. These files are currently parsed by a shell
script. [1]
This patchset allows those files to be specified directly on the
test_progs command line (eg, as -a @ALLOWLIST).
This also fixes a bug in the existing test filter code causing
unnecessary duplicate top-level test filter entries to be created.
[1] https://github.com/kernel-patches/vmtest/blob/
57feb460047b69f891cf4afe3cc860794a2ced17/ci/vmtest/run_selftests.sh#L21-L27
---
v2:
- error handling style changes per reviewer comments
- fdopen return value checking in test_parse_test_list_file
v1:
https://lore.kernel.org/bpf/
20230425225401.1075796-1-sveiss@meta.com/
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Stephen Veiss [Thu, 27 Apr 2023 22:53:33 +0000 (15:53 -0700)]
selftests/bpf: Test_progs can read test lists from file
Improve test selection logic when using -a/-b/-d/-t options.
The list of tests to include or exclude can now be read from a file,
specified as @<filename>.
The file contains one name (or wildcard pattern) per line, and
comments beginning with # are ignored.
These options can be passed multiple times to read more than one file.
Signed-off-by: Stephen Veiss <sveiss@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230427225333.3506052-3-sveiss@meta.com
Stephen Veiss [Thu, 27 Apr 2023 22:53:32 +0000 (15:53 -0700)]
selftests/bpf: Extract insert_test from parse_test_list
Split the logic to insert new tests into test filter sets out from
parse_test_list.
Fix the subtest insertion logic to reuse an existing top-level test
filter, which prevents the creation of duplicate top-level test filters
each with a single subtest.
Signed-off-by: Stephen Veiss <sveiss@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230427225333.3506052-2-sveiss@meta.com
Martin KaFai Lau [Fri, 28 Apr 2023 01:36:38 +0000 (18:36 -0700)]
libbpf: btf_dump_type_data_check_overflow needs to consider BTF_MEMBER_BITFIELD_SIZE
The btf_dump/struct_data selftest is failing with:
[...]
test_btf_dump_struct_data:FAIL:unexpected return value dumping fs_context unexpected unexpected return value dumping fs_context: actual -7 != expected 264
[...]
The reason is in btf_dump_type_data_check_overflow(). It does not use
BTF_MEMBER_BITFIELD_SIZE from the struct's member (btf_member). Instead,
it is using the enum size which is 4. It had been working till the recent
commit
4e04143c869c ("fs_context: drop the unused lsm_flags member")
removed an integer member which also removed the 4 bytes padding at the
end of the fs_context. Missing this 4 bytes padding exposed this bug. In
particular, when btf_dump_type_data_check_overflow() reaches the member
'phase', -E2BIG is returned.
The fix is to pass bit_sz to btf_dump_type_data_check_overflow(). In
btf_dump_type_data_check_overflow(), it does a different size check when
bit_sz is not zero.
The current fs_context:
[3600] ENUM 'fs_context_purpose' encoding=UNSIGNED size=4 vlen=3
'FS_CONTEXT_FOR_MOUNT' val=0
'FS_CONTEXT_FOR_SUBMOUNT' val=1
'FS_CONTEXT_FOR_RECONFIGURE' val=2
[3601] ENUM 'fs_context_phase' encoding=UNSIGNED size=4 vlen=7
'FS_CONTEXT_CREATE_PARAMS' val=0
'FS_CONTEXT_CREATING' val=1
'FS_CONTEXT_AWAITING_MOUNT' val=2
'FS_CONTEXT_AWAITING_RECONF' val=3
'FS_CONTEXT_RECONF_PARAMS' val=4
'FS_CONTEXT_RECONFIGURING' val=5
'FS_CONTEXT_FAILED' val=6
[3602] STRUCT 'fs_context' size=264 vlen=21
'ops' type_id=3603 bits_offset=0
'uapi_mutex' type_id=235 bits_offset=64
'fs_type' type_id=872 bits_offset=1216
'fs_private' type_id=21 bits_offset=1280
'sget_key' type_id=21 bits_offset=1344
'root' type_id=781 bits_offset=1408
'user_ns' type_id=251 bits_offset=1472
'net_ns' type_id=984 bits_offset=1536
'cred' type_id=1785 bits_offset=1600
'log' type_id=3621 bits_offset=1664
'source' type_id=42 bits_offset=1792
'security' type_id=21 bits_offset=1856
's_fs_info' type_id=21 bits_offset=1920
'sb_flags' type_id=20 bits_offset=1984
'sb_flags_mask' type_id=20 bits_offset=2016
's_iflags' type_id=20 bits_offset=2048
'purpose' type_id=3600 bits_offset=2080 bitfield_size=8
'phase' type_id=3601 bits_offset=2088 bitfield_size=8
'need_free' type_id=67 bits_offset=2096 bitfield_size=1
'global' type_id=67 bits_offset=2097 bitfield_size=1
'oldapi' type_id=67 bits_offset=2098 bitfield_size=1
Fixes:
920d16af9b42 ("libbpf: BTF dumper support for typed data")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230428013638.1581263-1-martin.lau@linux.dev
Martin KaFai Lau [Fri, 28 Apr 2023 03:37:44 +0000 (20:37 -0700)]
selftests/bpf: Add fexit_sleep to DENYLIST.aarch64
It is reported that the fexit_sleep never returns in aarch64.
The remaining tests cannot start. Put this test into DENYLIST.aarch64
for now so that other tests can continue to run in the CI.
Acked-by: Manu Bretelle <chantr4@gmail.com>
Reported-by: Manu Bretelle <chantra@meta.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Yonghong Song [Tue, 25 Apr 2023 17:47:44 +0000 (10:47 -0700)]
selftests/bpf: Fix selftest test_global_funcs/global_func1 failure with latest clang
The selftest test_global_funcs/global_func1 failed with the latest clang17.
The reason is due to upstream ArgumentPromotionPass ([1]),
which may manipulate static function parameters and cause inlining
although the funciton is marked as noinline.
The original code:
static __attribute__ ((noinline))
int f0(int var, struct __sk_buff *skb)
{
return skb->len;
}
__attribute__ ((noinline))
int f1(struct __sk_buff *skb)
{
...
return f0(0, skb) + skb->len;
}
...
SEC("tc")
__failure __msg("combined stack size of 4 calls is 544")
int global_func1(struct __sk_buff *skb)
{
return f0(1, skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4);
}
After ArgumentPromotionPass, the code is translated to
static __attribute__ ((noinline))
int f0(int var, int skb_len)
{
return skb_len;
}
__attribute__ ((noinline))
int f1(struct __sk_buff *skb)
{
...
return f0(0, skb->len) + skb->len;
}
...
SEC("tc")
__failure __msg("combined stack size of 4 calls is 544")
int global_func1(struct __sk_buff *skb)
{
return f0(1, skb->len) + f1(skb) + f2(2, skb) + f3(3, skb, 4);
}
And later llvm InstCombine phase recognized that f0()
simplify returns the value of the second argument and removed f0()
completely and the final code looks like:
__attribute__ ((noinline))
int f1(struct __sk_buff *skb)
{
...
return skb->len + skb->len;
}
...
SEC("tc")
__failure __msg("combined stack size of 4 calls is 544")
int global_func1(struct __sk_buff *skb)
{
return skb->len + f1(skb) + f2(2, skb) + f3(3, skb, 4);
}
If f0() is not inlined, the verification will fail with stack size
544 for a particular callchain. With f0() inlined, the maximum
stack size is 512 which is in the limit.
Let us add a `asm volatile ("")` in f0() to prevent ArgumentPromotionPass
from hoisting the code to its caller, and this fixed the test failure.
[1] https://reviews.llvm.org/D148269
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230425174744.1758515-1-yhs@fb.com
Xueming Feng [Thu, 27 Apr 2023 12:03:13 +0000 (20:03 +0800)]
bpftool: Dump map id instead of value for map_of_maps types
When using `bpftool map dump` with map_of_maps, it is usually
more convenient to show the inner map id instead of raw value.
We are changing the plain print behavior to show inner_map_id
instead of hex value, this would help with quick look up of
inner map with `bpftool map dump id <inner_map_id>`.
To avoid disrupting scripted behavior, we will add a new
`inner_map_id` field to json output instead of replacing value.
plain print:
```
$ bpftool map dump id 138
Without Patch:
key:
fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 05
27 16 06 00
value:
8b 00 00 00
Found 1 element
With Patch:
key:
fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 05
27 16 06 00
inner_map_id:
139
Found 1 element
```
json print:
```
$ bpftool -p map dump id 567
Without Patch:
[{
"key": ["0xc0","0x00","0x02","0x05","0x27","0x16","0x06","0x00"
],
"value": ["0x38","0x02","0x00","0x00"
]
}
]
With Patch:
[{
"key": ["0xc0","0x00","0x02","0x05","0x27","0x16","0x06","0x00"
],
"value": ["0x38","0x02","0x00","0x00"
],
"inner_map_id": 568
}
]
```
Signed-off-by: Xueming Feng <kuro@kuroa.me>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230427120313.43574-1-kuro@kuroa.me
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Kal Conley [Sun, 23 Apr 2023 18:01:56 +0000 (20:01 +0200)]
xsk: Use pool->dma_pages to check for DMA
Compare pool->dma_pages instead of pool->dma_pages_cnt to check for an
active DMA mapping. pool->dma_pages needs to be read anyway to access
the map so this compiles to more efficient code.
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230423180157.93559-1-kal.conley@dectris.com
Florent Revest [Thu, 27 Apr 2023 14:32:07 +0000 (16:32 +0200)]
selftests/bpf: Update the aarch64 tests deny list
Now that ftrace supports direct call on arm64, BPF tracing programs work
on that architecture. This fixes the vast majority of BPF selftests
except for:
- multi_kprobe programs which require fprobe, not available on arm64 yet
- tracing_struct which requires trampoline support to access struct args
This patch updates the list of BPF selftests which are known to fail so
the BPF CI can validate the tests which pass now.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230427143207.635263-1-revest@chromium.org
Jesper Dangaard Brouer [Tue, 18 Apr 2023 13:31:03 +0000 (15:31 +0200)]
selftests/bpf: xdp_hw_metadata track more timestamps
To correlate the hardware RX timestamp with something, add tracking of
two software timestamps both clock source CLOCK_TAI (see description in
man clock_gettime(2)).
XDP metadata is extended with xdp_timestamp for capturing when XDP
received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
could not find a BPF helper for getting CLOCK_REALTIME, which would have
been preferred. In userspace when AF_XDP sees the packet another
software timestamp is recorded via clock_gettime() also clock source
CLOCK_TAI.
Example output shortly after loading igc driver:
poll: 1 (0) skip=1 fail=0 redir=2
xsk_ring_cons__peek: 1
0x12557a8: rx_desc[1]->addr=
100000000009000 addr=9100 comp_addr=9000
rx_hash: 0x82A96531 with RSS type:0x1
rx_timestamp:
1681740540304898909 (sec:
1681740540.3049)
XDP RX-time:
1681740577304958316 (sec:
1681740577.3050) delta sec:37.0001 (
37000059.407 usec)
AF_XDP time:
1681740577305051315 (sec:
1681740577.3051) delta sec:0.0001 (92.999 usec)
0x12557a8: complete idx=9 addr=9000
The first observation is that the 37 sec difference between RX HW vs XDP
timestamps, which indicate hardware is likely clock source
CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
with a 37 sec offset.
The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the
userspace wakeup time. On this hardware it was caused by CPU idle sleep
states, which can be reduced by tuning /dev/cpu_dma_latency.
View current requested/allowed latency bound via:
hexdump --format '"%d\n"' /dev/cpu_dma_latency
More explanation of the output and how this can be used to identify
clock drift for the HW clock can be seen here[1]:
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>
Link: https://lore.kernel.org/bpf/168182466298.616355.2544377890818617459.stgit@firesoul
Jesper Dangaard Brouer [Tue, 18 Apr 2023 13:30:57 +0000 (15:30 +0200)]
igc: Add XDP hints kfuncs for RX timestamp
The NIC hardware RX timestamping mechanism adds an optional tailored
header before the MAC header containing packet reception time. Optional
depending on RX descriptor TSIP status bit (IGC_RXDADV_STAT_TSIP). In
case this bit is set driver does offset adjustments to packet data start
and extracts the timestamp.
The timestamp need to be extracted before invoking the XDP bpf_prog,
because this area just before the packet is also accessible by XDP via
data_meta context pointer (and helper bpf_xdp_adjust_meta). Thus, an XDP
bpf_prog can potentially overwrite this and corrupt data that we want to
extract with the new kfunc for reading the timestamp.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>
Link: https://lore.kernel.org/bpf/168182465791.616355.2583922957423587914.stgit@firesoul
Jesper Dangaard Brouer [Tue, 18 Apr 2023 13:30:52 +0000 (15:30 +0200)]
igc: Add XDP hints kfuncs for RX hash
This implements XDP hints kfunc for RX-hash (xmo_rx_hash).
The HW rss hash type is handled via mapping table.
This igc driver (default config) does L3 hashing for UDP packets
(excludes UDP src/dest ports in hash calc). Meaning RSS hash type is
L3 based. Tested that the igc_rss_type_num for UDP is either
IGC_RSS_TYPE_HASH_IPV4 or IGC_RSS_TYPE_HASH_IPV6.
This patch also updates AF_XDP zero-copy function igc_clean_rx_irq_zc()
to use the xdp_buff wrapper struct igc_xdp_buff.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>
Link: https://lore.kernel.org/bpf/168182465285.616355.2701740913376314790.stgit@firesoul
Jesper Dangaard Brouer [Tue, 18 Apr 2023 13:30:47 +0000 (15:30 +0200)]
igc: Add igc_xdp_buff wrapper for xdp_buff in driver
Driver specific metadata data for XDP-hints kfuncs are propagated via tail
extending the struct xdp_buff with a locally scoped driver struct.
Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This
xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for
extending the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define
catch size violations build time.
The changes needed for AF_XDP zero-copy in igc_clean_rx_irq_zc()
is done in next patch, because the member rx_desc isn't available
at this point.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>
Link: https://lore.kernel.org/bpf/168182464779.616355.3761989884165609387.stgit@firesoul
Jesper Dangaard Brouer [Tue, 18 Apr 2023 13:30:42 +0000 (15:30 +0200)]
igc: Enable and fix RX hash usage by netstack
When function igc_rx_hash() was introduced in v4.20 via commit
0507ef8a0372
("igc: Add transmit and receive fastpath and interrupt handlers"), the
hardware wasn't configured to provide RSS hash, thus it made sense to not
enable net_device NETIF_F_RXHASH feature bit.
The NIC hardware was configured to enable RSS hash info in v5.2 via commit
2121c2712f82 ("igc: Add multiple receive queues control supporting"), but
forgot to set the NETIF_F_RXHASH feature bit.
The original implementation of igc_rx_hash() didn't extract the associated
pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of
this patch are about extracting the RSS Type from the hardware and mapping
this to enum pkt_hash_types. This was based on Foxville i225 software user
manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03).
For UDP it's worth noting that RSS (type) hashing have been disabled both for
IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP)
because hardware RSS doesn't handle fragmented pkts well when enabled (can
cause out-of-order). This results in PKT_HASH_TYPE_L3 for UDP packets, and
hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have
the effect that netstack will do a software based hash calc calling into
flow_dissect, but only when code calls skb_get_hash(), which doesn't
necessary happen for local delivery.
For QA verification testing I wrote a small bpftrace prog:
[0] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/monitor_skb_hash_on_dev.bt
Fixes:
2121c2712f82 ("igc: Add multiple receive queues control supporting")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>
Link: https://lore.kernel.org/bpf/168182464270.616355.11391652654430626584.stgit@firesoul
Kui-Feng Lee [Fri, 21 Apr 2023 21:41:31 +0000 (14:41 -0700)]
bpftool: Show map IDs along with struct_ops links.
A new link type, BPF_LINK_TYPE_STRUCT_OPS, was added to attach
struct_ops to links. (
226bc6ae6405) It would be helpful for users to
know which map is associated with the link.
The assumption was that every link is associated with a BPF program, but
this does not hold true for struct_ops. It would be better to display
map_id instead of prog_id for struct_ops links. However, some tools may
rely on the old assumption and need a prog_id. The discussion on the
mailing list suggests that tools should parse JSON format. We will maintain
the existing JSON format by adding a map_id without removing prog_id. As
for plain text format, we will remove prog_id from the header line and add
a map_id for struct_ops links.
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230421214131.352662-1-kuifeng@meta.com
Joe Stringer [Sat, 22 Apr 2023 17:20:54 +0000 (10:20 -0700)]
docs/bpf: Add LRU internals description and graph
Extend the bpf hashmap docs to include a brief description of the
internals of the LRU map type (setting appropriate API expectations),
including the original commit message from Martin and a variant on the
graph that I had presented during my Linux Plumbers Conference 2022
talk on "Pressure feedback for LRU map types"[0].
The node names in the dot file correspond roughly to the functions
where the logic for those decisions or steps is defined, to help
curious developers to cross-reference and update this logic if the
details of the LRU implementation ever differ from this description.
[0] https://lpc.events/event/16/contributions/1368/
Signed-off-by: Joe Stringer <joe@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230422172054.3355436-2-joe@isovalent.com
Joe Stringer [Sat, 22 Apr 2023 17:20:53 +0000 (10:20 -0700)]
docs/bpf: Add table to describe LRU properties
Depending on the map type and flags for LRU, different properties are
global or percpu. Add a table to describe these.
Signed-off-by: Joe Stringer <joe@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230422172054.3355436-1-joe@isovalent.com
Daniel Borkmann [Tue, 4 Apr 2023 14:05:58 +0000 (14:05 +0000)]
selftests/bpf: Add test case to assert precise scalar path pruning
Add a test case to check for precision marking of safe paths. Ensure
that the verifier will not prematurely prune scalars contributing to
registers needing precision.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Joanne Koong [Thu, 20 Apr 2023 07:14:14 +0000 (00:14 -0700)]
selftests/bpf: Add tests for dynptr convenience helpers
Add various tests for the added dynptr convenience helpers.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230420071414.570108-6-joannelkoong@gmail.com
Joanne Koong [Thu, 20 Apr 2023 07:14:13 +0000 (00:14 -0700)]
bpf: Add bpf_dynptr_clone
The cloned dynptr will point to the same data as its parent dynptr,
with the same type, offset, size and read-only properties.
Any writes to a dynptr will be reflected across all instances
(by 'instance', this means any dynptrs that point to the same
underlying data).
Please note that data slice and dynptr invalidations will affect all
instances as well. For example, if bpf_dynptr_write() is called on an
skb-type dynptr, all data slices of dynptr instances to that skb
will be invalidated as well (eg data slices of any clones, parents,
grandparents, ...). Another example is if a ringbuf dynptr is submitted,
any instance of that dynptr will be invalidated.
Changing the view of the dynptr (eg advancing the offset or
trimming the size) will only affect that dynptr and not affect any
other instances.
One example use case where cloning may be helpful is for hashing or
iterating through dynptr data. Cloning will allow the user to maintain
the original view of the dynptr for future use, while also allowing
views to smaller subsets of the data after the offset is advanced or the
size is trimmed.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230420071414.570108-5-joannelkoong@gmail.com
Joanne Koong [Thu, 20 Apr 2023 07:14:12 +0000 (00:14 -0700)]
bpf: Add bpf_dynptr_size
bpf_dynptr_size returns the number of usable bytes in a dynptr.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230420071414.570108-4-joannelkoong@gmail.com
Joanne Koong [Thu, 20 Apr 2023 07:14:11 +0000 (00:14 -0700)]
bpf: Add bpf_dynptr_is_null and bpf_dynptr_is_rdonly
bpf_dynptr_is_null returns true if the dynptr is null / invalid
(determined by whether ptr->data is NULL), else false if
the dynptr is a valid dynptr.
bpf_dynptr_is_rdonly returns true if the dynptr is read-only,
else false if the dynptr is read-writable. If the dynptr is
null / invalid, false is returned by default.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230420071414.570108-3-joannelkoong@gmail.com
Joanne Koong [Thu, 20 Apr 2023 07:14:10 +0000 (00:14 -0700)]
bpf: Add bpf_dynptr_adjust
Add a new kfunc
int bpf_dynptr_adjust(struct bpf_dynptr_kern *ptr, u32 start, u32 end);
which adjusts the dynptr to reflect the new [start, end) interval.
In particular, it advances the offset of the dynptr by "start" bytes,
and if end is less than the size of the dynptr, then this will trim the
dynptr accordingly.
Adjusting the dynptr interval may be useful in certain situations.
For example, when hashing which takes in generic dynptrs, if the dynptr
points to a struct but only a certain memory region inside the struct
should be hashed, adjust can be used to narrow in on the
specific region to hash.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230420071414.570108-2-joannelkoong@gmail.com
Linus Torvalds [Wed, 26 Apr 2023 23:07:23 +0000 (16:07 -0700)]
Merge tag 'net-next-6.4' of git://git./linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances
- Reduce compound page head access for zero-copy data transfers
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
possible
- Threaded NAPI improvements, adding defer skb free support and
unneeded softirq avoidance
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking
- Add lockless accesses annotation to sk_err[_soft]
- Optimize again the skb struct layout
- Extends the skb drop reasons to make it usable by multiple
subsystems
- Better const qualifier awareness for socket casts
BPF:
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and
variable-sized accesses
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward
- Add more precise memory usage reporting for all BPF map types
- Adds support for using {FOU,GUE} encap with an ipip device
operating in collect_md mode and add a set of BPF kfuncs for
controlling encap params
- Allow BPF programs to detect at load time whether a particular
kfunc exists or not, and also add support for this in light
skeleton
- Bigger batch of BPF verifier improvements to prepare for upcoming
BPF open-coded iterators allowing for less restrictive looping
capabilities
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce
BPF programs to NULL-check before passing such pointers into kfunc
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
in local storage maps
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree
- Add BPF verifier support for ST instructions in
convert_ctx_access() which will help new -mcpu=v4 clang flag to
start emitting them
- Add ARM32 USDT support to libbpf
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations
Protocols:
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
- Add the handshake upcall mechanism, allowing the user-space to
implement generic TLS handshake on kernel's behalf
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter:
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged
- Update bridge netfilter and ovs conntrack helpers to handle IPv6
Jumbo packets properly, i.e. fetch the packet length from
hop-by-hop extension header. This is needed for BIT TCP support
- The iptables 32bit compat interface isn't compiled in by default
anymore
- Move ip(6)tables builtin icmp matches to the udptcp one. This has
the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device
Driver API:
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them
- Allow the page_pool to directly recycle the pages from safely
localized NAPI
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization
- Add YNL support for user headers and struct attrs
- Add partial YNL specification for devlink
- Add partial YNL specification for ethtool
- Add tc-mqprio and tc-taprio support for preemptible traffic classes
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device
- Add basic LED support for switch/phy
- Add NAPI documentation, stop relaying on external links
- Convert dsa_master_ioctl() to netdev notifier. This is a
preparatory work to make the hardware timestamping layer selectable
by user space
- Add transceiver support and improve the error messages for CAN-FD
controllers
New hardware / drivers:
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers:
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors
- add support for configuring max SDU for each Tx queue
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only on
shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices (e.g. MAC address from
efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support"
* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
net: phy: hide the PHYLIB_LEDS knob
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
net: amd: Fix link leak when verifying config failed
net: phy: marvell: Fix inconsistent indenting in led_blink_set
lan966x: Don't use xdp_frame when action is XDP_TX
tsnep: Add XDP socket zero-copy TX support
tsnep: Add XDP socket zero-copy RX support
tsnep: Move skb receive action to separate function
tsnep: Add functions for queue enable/disable
tsnep: Rework TX/RX queue initialization
tsnep: Replace modulo operation with mask
net: phy: dp83867: Add led_brightness_set support
net: phy: Fix reading LED reg property
drivers: nfc: nfcsim: remove return value check of `dev_dir`
net: phy: dp83867: Remove unnecessary (void*) conversions
net: ethtool: coalesce: try to make user settings stick twice
net: mana: Check if netdev/napi_alloc_frag returns single page
net: mana: Rename mana_refill_rxoob and remove some empty lines
net: veth: add page_pool stats
...
Linus Torvalds [Wed, 26 Apr 2023 22:39:25 +0000 (15:39 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target,
mpi3mr, hisi_sas, arcmsr).
The major core change is the constification of the host templates
(which touches everything) along with other minor fixups and clean
ups"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command()
scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately
scsi: cxlflash: s/semahpore/semaphore/
scsi: lpfc: Silence an incorrect device output
scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation
scsi: scsi_debug: Fix missing error code in scsi_debug_init()
scsi: hisi_sas: Work around build failure in suspend function
scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()
scsi: mpt3sas: Fix an issue when driver is being removed
scsi: mpt3sas: Remove HBA BIOS version in the kernel log
scsi: target: core: Fix invalid memory access
scsi: scsi_debug: Drop sdebug_queue
scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts
scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store()
scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued()
scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll()
scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd
scsi: scsi_debug: Use scsi_block_requests() to block queues
scsi: scsi_debug: Protect block_unblock_all_queues() with mutex
scsi: scsi_debug: Change shost list lock to a mutex
...
Linus Torvalds [Wed, 26 Apr 2023 20:09:45 +0000 (13:09 -0700)]
Merge tag 'ata-6.4-rc1' of git://git./linux/kernel/git/dlemoal/libata
Pull ata updates from Damien Le Moal:
- Many cleanups of the pata_parport driver and of its protocol modules
(Ondrej)
- Remove unused code (ata_id_xxx() functions) (Sergey)
- Add Add UniPhier SATA controller DT bindings (Kunihiko)
- Fix dependencies for the Freescale QorIQ AHCI SATA controller driver
(Geert)
- DT property handling improvements (Rob)
* tag 'ata-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (57 commits)
ata: pata_parport-bpck6: Declare mode_map as static
ata: pata_parport-bpck6: Remove dependency on 64BIT
ata: pata_parport-bpck6: reduce indents in bpck6_open
ata: pata_parport-bpck6: delete ppc6lnx.c
ata: pata_parport-bpck6: move defines and mode_map to bpck6.c
ata: pata_parport-bpck6: move ppc6_wr_data_byte to bpck6.c and rename
ata: pata_parport-bpck6: move ppc6_rd_data_byte to bpck6.c and rename
ata: pata_parport-bpck6: move ppc6_send_cmd to bpck6.c and rename
ata: pata_parport-bpck6: move ppc6_deselect to bpck6.c and rename
ata: pata_parport-bpck6: merge ppc6_select into bpck6_open
ata: pata_parport-bpck6: move ppc6_open to bpck6.c and rename
ata: pata_parport-bpck6: move ppc6_wr_extout to bpck6.c and rename
ata: pata_parport-bpck6: move ppc6_wait_for_fifo to bpck6.c and rename
ata: pata_parport-bpck6: merge ppc6_wr_data_blk into bpck6_write_block
ata: pata_parport-bpck6: merge ppc6_rd_data_blk into bpck6_read_block
ata: pata_parport-bpck6: merge ppc6_wr_port16_blk into bpck6_write_block
ata: pata_parport-bpck6: merge ppc6_rd_port16_blk into bpck6_read_block
ata: pata_parport-bpck6: merge ppc6_wr_port into bpck6_write_regr
ata: pata_parport-bpck6: merge ppc6_rd_port into bpck6_read_regr
ata: pata_parport-bpck6: remove ppc6_close
...
Linus Torvalds [Wed, 26 Apr 2023 20:05:21 +0000 (13:05 -0700)]
Merge tag 'for-6.4/dm-changes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Split dm-bufio's rw_semaphore and rbtree. Offers improvements to
dm-bufio's locking to allow increased concurrent IO -- particularly
for read access for buffers already in dm-bufio's cache.
- Also split dm-bio-prison-v1's spinlock and rbtree with comparable aim
at improving concurrent IO (for the DM thinp target).
- Both the dm-bufio and dm-bio-prison-v1 scaling of the number of locks
and rbtrees used are managed by dm_num_hash_locks(). And the hash
function used by both is dm_hash_locks_index().
- Allow DM targets to require DISCARD, WRITE_ZEROES and SECURE_ERASE to
be split at the target specified boundary (in terms of
max_discard_sectors, max_write_zeroes_sectors and
max_secure_erase_sectors respectively).
- DM verity error handling fix for check_at_most_once on FEC.
- Update DM verity target to emit audit events on verification failure
and more.
- DM core ->io_hints improvements needed in support of new discard
support that is added to the DM "zero" and "error" targets.
- Fix missing kmem_cache_destroy() call in initialization error path of
both the DM integrity and DM clone targets.
- A couple fixes for DM flakey, also add "error_reads" feature.
- Fix DM core's resume to not lock FS when the DM map is NULL;
otherwise initial table load can race with FS mount that takes
superblock's ->s_umount rw_semaphore.
- Various small improvements to both DM core and DM targets.
* tag 'for-6.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (40 commits)
dm: don't lock fs when the map is NULL in process of resume
dm flakey: add an "error_reads" option
dm flakey: remove trailing space in the table line
dm flakey: fix a crash with invalid table line
dm ioctl: fix nested locking in table_clear() to remove deadlock concern
dm: unexport dm_get_queue_limits()
dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE
dm: add helper macro for simple DM target module init and exit
dm raid: remove unused d variable
dm: remove unnecessary (void*) conversions
dm mirror: add DMERR message if alloc_workqueue fails
dm: push error reporting down to dm_register_target()
dm integrity: call kmem_cache_destroy() in dm_integrity_init() error path
dm clone: call kmem_cache_destroy() in dm_clone_init() error path
dm error: add discard support
dm zero: add discard support
dm table: allow targets without devices to set ->io_hints
dm verity: emit audit events on verification failure and more
dm verity: fix error handling for check_at_most_once on FEC
dm: improve hash_locks sizing and hash function
...
Linus Torvalds [Wed, 26 Apr 2023 19:52:58 +0000 (12:52 -0700)]
Merge tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- drbd patches, bringing us closer to unifying the out-of-tree version
and the in tree one (Andreas, Christoph)
- support for auto-quiesce for the s390 dasd driver (Stefan)
- MD pull request via Song:
- md/bitmap: Optimal last page size (Jon Derrick)
- Various raid10 fixes (Yu Kuai, Li Nan)
- md: add error_handlers for raid0 and linear (Mariusz Tkaczyk)
- NVMe pull request via Christoph:
- Drop redundant pci_enable_pcie_error_reporting (Bjorn Helgaas)
- Validate nvmet module parameters (Chaitanya Kulkarni)
- Fence TCP socket on receive error (Chris Leech)
- Fix async event trace event (Keith Busch)
- Minor cleanups (Chaitanya Kulkarni, zhenwei pi)
- Fix and cleanup nvmet Identify handling (Damien Le Moal,
Christoph Hellwig)
- Fix double blk_mq_complete_request race in the timeout handler
(Lei Yin)
- Fix irq locking in nvme-fcloop (Ming Lei)
- Remove queue mapping helper for rdma devices (Sagi Grimberg)
- use structured request attribute checks for nbd (Jakub)
- fix blk-crypto race conditions between keyslot management (Eric)
- add sed-opal support for reading read locking range attributes
(Ondrej)
- make fault injection configurable for null_blk (Akinobu)
- clean up the request insertion API (Christoph)
- clean up the queue running API (Christoph)
- blkg config helper cleanups (Tejun)
- lazy init support for blk-iolatency (Tejun)
- various fixes and tweaks to ublk (Ming)
- remove hybrid polling. It hasn't really been useful since we got
async polled IO support, and these days we don't support sync polled
IO at all (Keith)
- misc fixes, cleanups, improvements (Zhong, Ondrej, Colin, Chengming,
Chaitanya, me)
* tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux: (118 commits)
nbd: fix incomplete validation of ioctl arg
ublk: don't return 0 in case of any failure
sed-opal: geometry feature reporting command
null_blk: Always check queue mode setting from configfs
block: ublk: switch to ioctl command encoding
blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush
block, bfq: Fix division by zero error on zero wsum
fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m
block: store bdev->bd_disk->fops->submit_bio state in bdev
block: re-arrange the struct block_device fields for better layout
md/raid5: remove unused working_disks variable
md/raid10: don't call bio_start_io_acct twice for bio which experienced read error
md/raid10: fix memleak of md thread
md/raid10: fix memleak for 'conf->bio_split'
md/raid10: fix leak of 'r10bio->remaining' for recovery
md/raid10: don't BUG_ON() in raise_barrier()
md: fix soft lockup in status_resync
md: add error_handlers for raid0 and linear
md: Use optimal I/O size for last bitmap page
md: Fix types in sb writer
...
Linus Torvalds [Wed, 26 Apr 2023 19:40:31 +0000 (12:40 -0700)]
Merge tag 'for-6.4/io_uring-2023-04-21' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:
- Cleanup of the io-wq per-node mapping, notably getting rid of it so
we just have a single io_wq entry per ring (Breno)
- Followup to the above, move accounting to io_wq as well and
completely drop struct io_wqe (Gabriel)
- Enable KASAN for the internal io_uring caches (Breno)
- Add support for multishot timeouts. Some applications use timeouts to
wake someone waiting on completion entries, and this makes it a bit
easier to just have a recurring timer rather than needing to rearm it
every time (David)
- Support archs that have shared cache coloring between userspace and
the kernel, and hence have strict address requirements for mmap'ing
the ring into userspace. This should only be parisc/hppa. (Helge, me)
- XFS has supported O_DIRECT writes without needing to lock the inode
exclusively for a long time, and ext4 now supports it as well. This
is true for the common cases of not extending the file size. Flag the
fs as having that feature, and utilize that to avoid serializing
those writes in io_uring (me)
- Enable completion batching for uring commands (me)
- Revert patch adding io_uring restriction to what can be GUP mapped or
not. This does not belong in io_uring, as io_uring isn't really
special in this regard. Since this is also getting in the way of
cleanups and improvements to the GUP code, get rid of if (me)
- A few series greatly reducing the complexity of registered resources,
like buffers or files. Not only does this clean up the code a lot,
the simplified code is also a LOT more efficient (Pavel)
- Series optimizing how we wait for events and run task_work related to
it (Pavel)
- Fixes for file/buffer unregistration with DEFER_TASKRUN (Pavel)
- Misc cleanups and improvements (Pavel, me)
* tag 'for-6.4/io_uring-2023-04-21' of git://git.kernel.dk/linux: (71 commits)
Revert "io_uring/rsrc: disallow multi-source reg buffers"
io_uring: add support for multishot timeouts
io_uring/rsrc: disassociate nodes and rsrc_data
io_uring/rsrc: devirtualise rsrc put callbacks
io_uring/rsrc: pass node to io_rsrc_put_work()
io_uring/rsrc: inline io_rsrc_put_work()
io_uring/rsrc: add empty flag in rsrc_node
io_uring/rsrc: merge nodes and io_rsrc_put
io_uring/rsrc: infer node from ctx on io_queue_rsrc_removal
io_uring/rsrc: remove unused io_rsrc_node::llist
io_uring/rsrc: refactor io_queue_rsrc_removal
io_uring/rsrc: simplify single file node switching
io_uring/rsrc: clean up __io_sqe_buffers_update()
io_uring/rsrc: inline switch_start fast path
io_uring/rsrc: remove rsrc_data refs
io_uring/rsrc: fix DEFER_TASKRUN rsrc quiesce
io_uring/rsrc: use wq for quiescing
io_uring/rsrc: refactor io_rsrc_ref_quiesce
io_uring/rsrc: remove io_rsrc_node::done
io_uring/rsrc: use nospec'ed indexes
...
Linus Torvalds [Wed, 26 Apr 2023 16:42:10 +0000 (09:42 -0700)]
Merge tag 'f2fs-for-6.4-rc1' of git://git./linux/kernel/git/jaegeuk/f2fs
Pull f2fs update from Jaegeuk Kim:
"In this round, we've mainly modified to support non-power-of-two zone
size, which is not required for f2fs by design. In order to avoid arch
dependency, we refactored the messy rb_entry structure shared across
different extent_cache. In addition to the improvement, we've also
fixed several subtle bugs and error cases.
Enhancements:
- support non-power-of-two zone size for zoned device
- remove sharing the rb_entry structure in extent cache
- refactor f2fs_gc to call checkpoint in urgent condition
- support iopoll
Bug fixes:
- fix potential corruption when moving a directory
- fix to avoid use-after-free for cached IPU bio
- fix the folio private usage
- avoid kernel warnings or panics in the cp_error case
- fix to recover quota data correctly
- fix some bugs in atomic operations
- fix system crash due to lack of free space in LFS
- fix null pointer panic in tracepoint in __replace_atomic_write_block
- fix iostat lock protection
- fix scheduling while atomic in decompression path
- preserve direct write semantics when buffering is forced
- fix to call f2fs_wait_on_page_writeback() in f2fs_write_raw_pages()"
* tag 'f2fs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
f2fs: remove unnessary comment in __may_age_extent_tree
f2fs: allocate node blocks for atomic write block replacement
f2fs: use cow inode data when updating atomic write
f2fs: remove power-of-two limitation of zoned device
f2fs: allocate trace path buffer from names_cache
f2fs: add has_enough_free_secs()
f2fs: relax sanity check if checkpoint is corrupted
f2fs: refactor f2fs_gc to call checkpoint in urgent condition
f2fs: remove folio_detach_private() in .invalidate_folio and .release_folio
f2fs: remove bulk remove_proc_entry() and unnecessary kobject_del()
f2fs: support iopoll method
f2fs: remove batched_trim_sections node description
f2fs: fix to check return value of inc_valid_block_count()
f2fs: fix to check return value of f2fs_do_truncate_blocks()
f2fs: fix passing relative address when discard zones
f2fs: fix potential corruption when moving a directory
f2fs: add radix_tree_preload_end in error case
f2fs: fix to recover quota data correctly
f2fs: fix to check readonly condition correctly
docs: f2fs: Correct instruction to disable checkpoint
...
Linus Torvalds [Wed, 26 Apr 2023 16:36:55 +0000 (09:36 -0700)]
Merge tag 'dlm-6.4' of git://git./linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
- Remove some unused features (related to lock timeouts) that have been
previously scheduled for removal
- Fix a bug where the pending callback flag would be incorrectly
cleared, which could potentially result in missing a completion
callback
- Use an unbound workqueue for dlm socket handling so that socket
operations can be processed with less delay
- Fix possible lockspace join connection errors with large clusters
(e.g. over 16 nodes) caused by a small socket backlog setting
- Use atomic bit ops for internal flags to help avoid mistakes copying
flag values from messages
- Fix recently introduced bug where memory for lvb data could be
unnecessarily allocated for a lock
* tag 'dlm-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
fs: dlm: stop unnecessarily filling zero ms_extra bytes
fs: dlm: switch lkb_sbflags to atomic ops
fs: dlm: rsb hash table flag value to atomic ops
fs: dlm: move internal flags to atomic ops
fs: dlm: change dflags to use atomic bits
fs: dlm: store lkb distributed flags into own value
fs: dlm: remove DLM_IFL_LOCAL_MS flag
fs: dlm: rename stub to local message flag
fs: dlm: remove deprecated code parts
DLM: increase socket backlog to avoid hangs with 16 nodes
fs: dlm: add unbound flag to dlm_io workqueue
fs: dlm: fix DLM_IFL_CB_PENDING gets overwritten
Linus Torvalds [Wed, 26 Apr 2023 16:28:15 +0000 (09:28 -0700)]
Merge tag 'gfs2-v6.3-rc3-fixes' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Fix revoke processing at unmount and on read-only remount
- Refuse reading in inodes with an impossible indirect block height
- Various minor cleanups
* tag 'gfs2-v6.3-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: gfs2_ail_empty_gl no log flush on error
gfs2: Issue message when revokes cannot be written
gfs2: Perform second log flush in gfs2_make_fs_ro
gfs2: return errors from gfs2_ail_empty_gl
gfs2: Move variable assignment behind a null pointer check in inode_go_dump
gfs2: Use gfs2_holder_initialized for jindex
gfs2: Eliminate gfs2_trim_blocks
gfs2: Fix inode height consistency check
gfs2: Remove ghs[] from gfs2_unlink
gfs2: Remove ghs[] from gfs2_link
gfs2: Remove duplicate i_nlink check from gfs2_link()
Linus Torvalds [Wed, 26 Apr 2023 16:13:44 +0000 (09:13 -0700)]
Merge tag 'for-6.4-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Mostly core changes and cleanups, some notable fixes and two
performance improvements in directory logging.
The IO path cleanups are removing or refactoring old code, scrub main
loop has been completely rewritten also refactoring old code.
There are some changes to non-btrfs code, mostly trivial, the cgroup
punt bio logic is only moved from generic code.
Performance improvements:
- improve logging changes in a directory during one transaction,
avoid iterating over items and reduce lock contention (fsync time
4x lower)
- when logging directory entries during one transaction, reduce
locking of subvolume trees by checking tree-log instead
(improvement in throughput and latency for concurrent access to a
subvolume)
Notable fixes:
- dev-replace:
- properly honor read mode when requested to avoid reading from
source device
- target device won't be used for eventual read repair, this is
unreliable for NODATASUM files
- when there are unpaired (and unrepairable) metadata during
replace, exit early with error and don't try to finish whole
operation
- scrub ioctl properly rejects unknown flags
- fix global block reserve calculations
- fix partial direct io write when there's a page fault in the
middle, iomap will try to continue with partial request but the
btrfs part did not match that, this can lead to zeros written
instead of data
Core changes:
- io path:
- continued cleanups and refactoring around bio handling
- extent io submit path simplifications and cleanups
- flush write path simplifications and cleanups
- rework logic of passing sync mode of bio, with further cleanups
- rewrite scrub code flow, restructure how the stripes are enumerated
and verified in a more unified way
- allow to set lower threshold for block group reclaim in debug mode
to aid zoned mode testing
- remove obsolete time-based delayed ref throttling logic when
truncating items
- DREW locks are not using percpu variables anymore
- more warning fixes (-Wmaybe-uninitialized)
- u64 division simplifications
- error handling improvements
Non-btrfs code changes:
- push cgroup punt bio logic to btrfs code (there was no other user
of that), the functionality can be now selected separately by
BLK_CGROUP_PUNT_BIO
- crc32c_impl removed after removing last uses in btrfs code
- add btrfs_assertfail() to objtool table"
* tag 'for-6.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (147 commits)
btrfs: mark btrfs_assertfail() __noreturn
btrfs: fix uninitialized variable warnings
btrfs: use log root when iterating over index keys when logging directory
btrfs: avoid iterating over all indexes when logging directory
btrfs: dev-replace: error out if we have unrepaired metadata error during
btrfs: remove pointless loop at btrfs_get_next_valid_item()
btrfs: scrub: reject unsupported scrub flags
btrfs: reinterpret async discard iops_limit=0 as no delay
btrfs: set default discard iops_limit to 1000
btrfs: remove unused raid56 functions which were dedicated for scrub
btrfs: scrub: remove scrub_bio structure
btrfs: scrub: remove scrub_block and scrub_sector structures
btrfs: scrub: remove the old scrub recheck code
btrfs: scrub: remove the old writeback infrastructure
btrfs: scrub: remove scrub_parity structure
btrfs: scrub: use scrub_stripe to implement RAID56 P/Q scrub
btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure
btrfs: scrub: introduce helper to queue a stripe for scrub
btrfs: scrub: introduce error reporting functionality for scrub_stripe
btrfs: scrub: introduce a writeback helper for scrub_stripe
...
Linus Torvalds [Wed, 26 Apr 2023 16:07:46 +0000 (09:07 -0700)]
Merge tag 'fs_for_v6.4-rc1' of git://git./linux/kernel/git/jack/linux-fs
Pull ext2, reiserfs, udf, and quota updates from Jan Kara:
"A couple of small fixes and cleanups for ext2, udf, reiserfs, and
quota.
The biggest change is making CONFIG_PRINT_QUOTA_WARNING depend on
BROKEN with an outlook for removing it completely in an year or so"
* tag 'fs_for_v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: mark PRINT_QUOTA_WARNING as BROKEN
quota: update Kconfig comment
reiserfs: remove unused iter variable
quota: Use register_sysctl_init() for registering fs_dqstats_table
reiserfs: remove unused sched_count variable
ext2: remove redundant assignment to pointer end
quota: make dquot_set_dqinfo return errors from ->write_info
quota: fixup *_write_file_info() to return proper error code
quota: simplify two-level sysctl registration for fs_dqstats_table
udf: use wrapper i_blocksize() in udf_discard_prealloc()
udf: Use folios in udf_adinicb_writepage()
ext2: Check block size validity during mount
ext2: Correct maximum ext2 filesystem block size
Linus Torvalds [Wed, 26 Apr 2023 15:57:41 +0000 (08:57 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"There are a number of major cleanups in ext4 this cycle:
- The data=journal writepath has been significantly cleaned up and
simplified, and reduces a large number of data=journal special
cases by Jan Kara.
- Ojaswin Muhoo has replaced linked list used to track extents that
have been used for inode preallocation with a red-black tree in the
multi-block allocator. This improves performance for workloads
which do a large number of random allocating writes.
- Thanks to Kemeng Shi for a lot of cleanup and bug fixes in the
multi-block allocator.
- Matthew wilcox has converted the code paths for reading and writing
ext4 pages to use folios.
- Jason Yan has continued to factor out ext4_fill_super() into
smaller functions for improve ease of maintenance and
comprehension.
- Josh Triplett has created an uapi header for ext4 userspace API's"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (105 commits)
ext4: Add a uapi header for ext4 userspace APIs
ext4: remove useless conditional branch code
ext4: remove unneeded check of nr_to_submit
ext4: move dax and encrypt checking into ext4_check_feature_compatibility()
ext4: factor out ext4_block_group_meta_init()
ext4: move s_reserved_gdt_blocks and addressable checking into ext4_check_geometry()
ext4: rename two functions with 'check'
ext4: factor out ext4_flex_groups_free()
ext4: use ext4_group_desc_free() in ext4_put_super() to save some duplicated code
ext4: factor out ext4_percpu_param_init() and ext4_percpu_param_destroy()
ext4: factor out ext4_hash_info_init()
Revert "ext4: Fix warnings when freezing filesystem with journaled data"
ext4: Update comment in mpage_prepare_extent_to_map()
ext4: Simplify handling of journalled data in ext4_bmap()
ext4: Drop special handling of journalled data from ext4_quota_on()
ext4: Drop special handling of journalled data from ext4_evict_inode()
ext4: Fix special handling of journalled data from extent zeroing
ext4: Drop special handling of journalled data from extent shifting operations
ext4: Drop special handling of journalled data from ext4_sync_file()
ext4: Commit transaction before writing back pages in data=journal mode
...
Linus Torvalds [Wed, 26 Apr 2023 15:51:51 +0000 (08:51 -0700)]
Merge tag 'fsverity-for-linus' of git://git./fs/fsverity/linux
Pull fsverity updates from Eric Biggers:
"Several cleanups and fixes for fs/verity/, including a couple minor
fixes to the changes in 6.3 that added support for Merkle tree block
sizes less than the page size"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fds
fsverity: explicitly check for buffer overflow in build_merkle_tree()
fsverity: use WARN_ON_ONCE instead of WARN_ON
fs-verity: simplify sysctls with register_sysctl()
fs/buffer.c: use b_folio for fsverity work
Linus Torvalds [Wed, 26 Apr 2023 15:37:51 +0000 (08:37 -0700)]
Merge tag 'fscrypt-for-linus' of git://git./fs/fscrypt/linux
Pull fscrypt updates from Eric Biggers:
"A few cleanups for fs/crypto/, and another patch to prepare for the
upcoming CephFS encryption support"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: optimize fscrypt_initialize()
fscrypt: use WARN_ON_ONCE instead of WARN_ON
fscrypt: new helper function - fscrypt_prepare_lookup_partial()
fs/buffer.c: use b_folio for fscrypt work
Linus Torvalds [Wed, 26 Apr 2023 15:32:52 +0000 (08:32 -0700)]
Merge tag 'v6.4-p1' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Total usage stats now include all that returned errors (instead of
just some)
- Remove maximum hash statesize limit
- Add cloning support for hmac and unkeyed hashes
- Demote BUG_ON in crypto_unregister_alg to a WARN_ON
Algorithms:
- Use RIP-relative addressing on x86 to prepare for PIE build
- Add accelerated AES/GCM stitched implementation on powerpc P10
- Add some test vectors for cmac(camellia)
- Remove failure case where jent is unavailable outside of FIPS mode
in drbg
- Add permanent and intermittent health error checks in jitter RNG
Drivers:
- Add support for 402xx devices in qat
- Add support for HiSTB TRNG
- Fix hash concurrency issues in stm32
- Add OP-TEE firmware support in caam"
* tag 'v6.4-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (139 commits)
i2c: designware: Add doorbell support for Mendocino
i2c: designware: Use PCI PSP driver for communication
powerpc: Move Power10 feature PPC_MODULE_FEATURE_P10
crypto: p10-aes-gcm - Remove POWER10_CPU dependency
crypto: testmgr - Add some test vectors for cmac(camellia)
crypto: cryptd - Add support for cloning hashes
crypto: cryptd - Convert hash to use modern init_tfm/exit_tfm
crypto: hmac - Add support for cloning
crypto: hash - Add crypto_clone_ahash/shash
crypto: api - Add crypto_clone_tfm
crypto: api - Add crypto_tfm_get
crypto: x86/sha - Use local .L symbols for code
crypto: x86/crc32 - Use local .L symbols for code
crypto: x86/aesni - Use local .L symbols for code
crypto: x86/sha256 - Use RIP-relative addressing
crypto: x86/ghash - Use RIP-relative addressing
crypto: x86/des3 - Use RIP-relative addressing
crypto: x86/crc32c - Use RIP-relative addressing
crypto: x86/cast6 - Use RIP-relative addressing
crypto: x86/cast5 - Use RIP-relative addressing
...
Linus Torvalds [Wed, 26 Apr 2023 15:25:57 +0000 (08:25 -0700)]
Merge tag 'flex-array-transformations-6.4-rc1' of git://git./linux/kernel/git/gustavoars/linux
Pull flexible-array updates from Gustavo Silva:
"Transform more zero-length and one-element arrays into C99
flexible-array members"
* tag 'flex-array-transformations-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
uapi: net: ipv6: Replace fake flex-array with flex-array member
drm/vmwgfx: Replace one-element array with flexible-array member
ASoC: uapi: Replace zero-length arrays with __DECLARE_FLEX_ARRAY() helper
Paolo Abeni [Wed, 26 Apr 2023 08:15:31 +0000 (10:15 +0200)]
net: phy: hide the PHYLIB_LEDS knob
commit
4bb7aac70b5d ("net: phy: fix circular LEDS_CLASS dependencies")
solved a build failure, but introduces a new config knob with a default
'y' value: PHYLIB_LEDS.
The latter is against the current new config policy. The exception
was raised to allow the user to catch bad configurations without led
support.
Anyway the current definition of PHYLIB_LEDS does not fit the above
goal: if LEDS_CLASS is disabled, the new config will be available
only with PHYLIB disabled, too.
Hide the mentioned config, to preserve the randconfig testing done so
far, while respecting the mentioned policy.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/d82489be8ed911c383c3447e9abf469995ccf39a.1682496488.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Wed, 26 Apr 2023 08:17:46 +0000 (10:17 +0200)]
Merge git://git./linux/kernel/git/netdev/net
No conflicts.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Wed, 26 Apr 2023 01:44:10 +0000 (18:44 -0700)]
Merge tag 'pm-6.4-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These update several cpufreq drivers and the cpufreq core, add sysfs
interface for exposing the time really spent in the platform low-power
state during suspend-to-idle, update devfreq (core and drivers) and
the pm-graph suite of tools and clean up code.
Specifics:
- Fix the frequency unit in cpufreq_verify_current_freq checks()
Sanjay Chandrashekara)
- Make mode_state_machine in amd-pstate static (Tom Rix)
- Make the cpufreq core require drivers with target_index() to set
freq_table (Viresh Kumar)
- Fix typo in the ARM_BRCMSTB_AVS_CPUFREQ Kconfig entry (Jingyu Wang)
- Use of_property_read_bool() for boolean properties in the pmac32
cpufreq driver (Rob Herring)
- Make the cpufreq sysfs interface return proper error codes on
obviously invalid input (qinyu)
- Add guided autonomous mode support to the AMD P-state driver (Wyes
Karny)
- Make the Intel P-state driver enable HWP IO boost on all server
platforms (Srinivas Pandruvada)
- Add opp and bandwidth support to tegra194 cpufreq driver (Sumit
Gupta)
- Use of_property_present() for testing DT property presence (Rob
Herring)
- Remove MODULE_LICENSE in non-modules (Nick Alcock)
- Add SM7225 to cpufreq-dt-platdev blocklist (Luca Weiss)
- Optimizations and fixes for qcom-cpufreq-hw driver (Krzysztof
Kozlowski, Konrad Dybcio, and Bjorn Andersson)
- DT binding updates for qcom-cpufreq-hw driver (Konrad Dybcio and
Bartosz Golaszewski)
- Updates and fixes for mediatek driver (Jia-Wei Chang and
AngeloGioacchino Del Regno)
- Use of_property_present() for testing DT property presence in the
cpuidle code (Rob Herring)
- Drop unnecessary (void *) conversions from the PM core (Li zeming)
- Add sysfs files to represent time spent in a platform sleep state
during suspend-to-idle and make AMD and Intel PMC drivers use them
Mario Limonciello)
- Use of_property_present() for testing DT property presence (Rob
Herring)
- Add set_required_opps() callback to the 'struct opp_table', to make
the code paths cleaner (Viresh Kumar)
- Update the pm-graph siute of utilities to v5.11 with the following
changes:
* New script which allows users to install the latest pm-graph
from the upstream github repo.
* Update all the dmesg suspend/resume PM print formats to be able
to process recent timelines using dmesg only.
* Add ethtool output to the log for the system's ethernet device
if ethtool exists.
* Make the tool more robustly handle events where mangled dmesg
or ftrace outputs do not include all the requisite data.
- Make the sleepgraph utility recognize "CPU killed" messages (Xueqin
Luo)
- Remove unneeded SRCU selection in Kconfig because it's always set
from devfreq core (Paul E. McKenney)
- Drop of_match_ptr() macro from exynos-bus.c because this driver is
always using the DT table for driver probe (Krzysztof Kozlowski)
- Use the preferred of_property_present() instead of the low-level
of_get_property() on exynos-bus.c (Rob Herring)
- Use devm_platform_get_and_ioream_resource() in exyno-ppmu.c (Yang
Li)"
* tag 'pm-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (44 commits)
platform/x86/intel/pmc: core: Report duration of time in HW sleep state
platform/x86/intel/pmc: core: Always capture counters on suspend
platform/x86/amd: pmc: Report duration of time in hw sleep state
PM: Add sysfs files to represent time spent in hardware sleep state
cpufreq: use correct unit when verify cur freq
cpufreq: tegra194: add OPP support and set bandwidth
cpufreq: amd-pstate: Make varaiable mode_state_machine static
PM: core: Remove unnecessary (void *) conversions
cpufreq: drivers with target_index() must set freq_table
PM / devfreq: exynos-ppmu: Use devm_platform_get_and_ioremap_resource()
OPP: Move required opps configuration to specialized callback
OPP: Handle all genpd cases together in _set_required_opps()
cpufreq: qcom-cpufreq-hw: Revert adding cpufreq qos
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCM2290
dt-bindings: cpufreq: cpufreq-qcom-hw: Sanitize data per compatible
dt-bindings: cpufreq: cpufreq-qcom-hw: Allow just 1 frequency domain
cpufreq: Add SM7225 to cpufreq-dt-platdev blocklist
cpufreq: qcom-cpufreq-hw: fix double IO unmap and resource release on exit
cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623
cpufreq: mediatek: raise proc/sram max voltage for MT8516
...
Linus Torvalds [Wed, 26 Apr 2023 01:37:41 +0000 (18:37 -0700)]
Merge tag 'acpi-6.4-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to upstream revision
20230331, fix the ACPI SBS driver and the evaluation of the _PDC
method on Xen dom0 in the ACPI processor driver, update the ACPI
driver for Intel SoCs and clean up code in multiple places.
Specifics:
- Update the ACPICA code in the kernel to upstream revision
20230331
including the following changes:
* Delete bogus node_array array of pointers from AEST table
(Jessica Clarke)
* Add support for trace buffer extension in GICC to the ACPI MADT
parser (Xiongfeng Wang)
* Add missing macro ACPI_FUNCTION_TRACE() for
acpi_ns_repair_HID() (Xiongfeng Wang)
* Add missing tables to astable (Pedro Falcato)
* Add support for 64 bit loong_arch compilation to ACPICA (Huacai
Chen)
* Add support for ASPT table in disassembler to ACPICA (Jeremi
Piotrowski)
* Add support for Arm's MPAM ACPI table version 2 (Hesham
Almatary)
* Update all copyrights/signons in ACPICA to 2023 (Bob Moore)
* Add support for ClockInput resource (v6.5) (Niyas Sait)
* Add RISC-V INTC interrupt controller definition to the list of
supported interrupt controllers for MADT (Sunil V L)
* Add structure definitions for the RISC-V RHCT ACPI table (Sunil
V L)
* Address several cases in which the ACPICA code might lead to
undefined behavior (Tamir Duberstein)
* Make ACPICA code support flexible arrays properly (Kees Cook)
* Check null return of ACPI_ALLOCATE_ZEROED in
acpi_db_display_objects() (void0red)
* Add os specific support for Zephyr RTOS to ACPICA (Najumon)
* Update version to
20230331 (Bob Moore)
- Fix evaluating the _PDC ACPI control method when running as Xen
dom0 (Roger Pau Monne)
- Use platform devices to load ACPI PPC and PCC drivers (Petr Pavlu)
- Check for null return of devm_kzalloc() in fch_misc_setup() (Kang
Chen)
- Log a message if enable_irq_wake() fails for the ACPI SCI (Simon
Gaiser)
- Initialize the correct IOMMU fwspec while parsing ACPI VIOT
(Jean-Philippe Brucker)
- Amend indentation and prefix error messages with FW_BUG in the ACPI
SPCR parsing code (Andy Shevchenko)
- Enable ACPI sysfs support for CCEL records (Kuppuswamy
Sathyanarayanan)
- Make the APEI error injection code warn on invalid arguments when
explicitly indicated by platform (Shuai Xue)
- Add CXL error types to the error injection code in APEI (Tony Luck)
- Refactor acpi_data_prop_read_single() (Andy Shevchenko)
- Fix two issues in the ACPI SBS driver (Armin Wolf)
- Replace ternary operator with min_t() in the generic ACPI thermal
zone driver (Jiangshan Yi)
- Ensure that ACPI notify handlers are not running after removal and
clean up code in acpi_sb_notify() (Rafael Wysocki)
- Remove register_backlight_delay module option and code and remove
quirks for false-positive backlight control support advertised on
desktop boards (Hans de Goede)
- Replace irqdomain.h include with struct declarations in ACPI
headers and update several pieces of code previously including of.h
implicitly through those headers (Rob Herring)
- Fix acpi_evaluate_dsm_typed() redefinition error (Kiran K)
- Update the pm_profile sysfs attribute documentation (Rafael
Wysocki)
- Add
80862289 ACPI _HID for second PWM controller on Cherry Trail to
the ACPI driver for Intel SoCs (Hans de Goede)"
* tag 'acpi-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
ACPI: LPSS: Add
80862289 ACPI _HID for second PWM controller on Cherry Trail
ACPI: bus: Ensure that notify handlers are not running after removal
ACPI: bus: Add missing braces to acpi_sb_notify()
ACPI: video: Remove desktops without backlight DMI quirks
ACPI: video: Remove register_backlight_delay module option and code
ACPI: Replace irqdomain.h include with struct declarations
fpga: lattice-sysconfig-spi: Add explicit include for of.h
tpm: atmel: Add explicit include for of.h
virtio-mmio: Add explicit include for of.h
pata: ixp4xx: Add explicit include for of.h
ata: pata_macio: Add explicit include of irqdomain.h
serial: 8250_tegra: Add explicit include for of.h
net: rfkill-gpio: Add explicit include for of.h
staging: iio: resolver: ad2s1210: Add explicit include for of.h
iio: adc: ad7292: Add explicit include for of.h
ACPICA: Update version to
20230331
ACPICA: add os specific support for Zephyr RTOS
ACPICA: ACPICA: check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects
ACPICA: acpi_resource_irq: Replace 1-element arrays with flexible array
ACPICA: acpi_madt_oem_data: Fix flexible array member definition
...
Linus Torvalds [Wed, 26 Apr 2023 01:32:43 +0000 (18:32 -0700)]
Merge tag 'thermal-6.4-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These mostly continue to prepare the thermal control subsystem for
using unified representation of trip points, which includes cleanups,
code refactoring and similar and update several drivers (for other
reasons), which includes new hardware support.
Specifics:
- Add a thermal zone 'devdata' accessor and modify several drivers to
use it (Daniel Lezcano)
- Prevent drivers from using the 'device' internal thermal zone
structure field directly (Daniel Lezcano)
- Clean up the hwmon thermal driver (Daniel Lezcano)
- Add thermal zone id accessor and thermal zone type accessor and
prevent drivers from using thermal zone fields directly (Daniel
Lezcano)
- Clean up the acerhdf and tegra thermal drivers (Daniel Lezcano)
- Add lower bound check for sysfs input to the x86_pkg_temp_thermal
Intel thermal driver (Zhang Rui)
- Add more thermal zone device encapsulation: prevent setting
structure field directly, access the sensor device instead the
thermal zone's device for trace, relocate the traces in
drivers/thermal (Daniel Lezcano)
- Use the generic trip point for the i.MX and remove the
get_trip_temp ops (Daniel Lezcano)
- Use the devm_platform_ioremap_resource() in the Hisilicon driver
(Yang Li)
- Remove R-Car H3 ES1.* handling as public has only access to the ES2
version and the upstream support for the ES1 has been shutdown
(Wolfram Sang)
- Add a delay after initializing the bank in order to let the time to
the hardware to initialze itself before reading the temperature
(Amjad Ouled-Ameur)
- Add MT8365 support (Amjad Ouled-Ameur)
- Preparational cleanup and DT bindings for RK3588 support (Sebastian
Reichel)
- Add driver support for RK3588 (Finley Xiao)
- Use devm_reset_control_array_get_exclusive() for the Rockchip
driver (Ye Xingchen)
- Detect power gated thermal zones and return -EAGAIN when reading
the temperature (Mikko Perttunen)
- Remove thermal_bind_params structure as it is unused (Zhang Rui)
- Drop unneeded quotes in DT bindings allowing to run yamllint (Rob
Herring)
- Update the power allocator documentation according to the thermal
trace relocation (Lukas Bulwahn)
- Fix sensor 1 interrupt status bitmask for the Mediatek LVTS sensor
(Chen-Yu Tsai)
- Use the dev_err_probe() helper in the Amlogic driver (Ye Xingchen)
- Add AP domain support to LVTS thermal controllers for mt8195
(Balsam CHIHI)
- Remove buggy call to thermal_of_zone_unregister() (Daniel Lezcano)
- Make thermal_of_zone_[un]register() private to the thermal OF code
(Daniel Lezcano)
- Create a private copy of the thermal zone device parameters
structure when registering a thermal zone (Daniel Lezcano)
- Fix a kernel NULL pointer dereference in thermal_hwmon (Zhang Rui)
- Revert recent message adjustment in thermal_hwmon (Rafael Wysocki)
- Use of_property_present() for testing DT property presence in
thermal control code (Rob Herring)
- Clean up thermal_list_lock locking in the thermal core (Rafael
Wysocki)
- Add DLVR support for RFIM control in the int340x Intel thermal
driver (Srinivas Pandruvada)"
* tag 'thermal-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (55 commits)
thermal: intel: int340x: Add DLVR support for RFIM control
thermal/core: Alloc-copy-free the thermal zone parameters structure
thermal/of: Unexport unused OF functions
thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister
thermal/drivers/mediatek/lvts_thermal: Add AP domain for mt8195
dt-bindings: thermal: mediatek: Add AP domain to LVTS thermal controllers for mt8195
thermal: amlogic: Use dev_err_probe()
thermal/drivers/mediatek/lvts_thermal: Fix sensor 1 interrupt status bitmask
MAINTAINERS: adjust entry in THERMAL/POWER_ALLOCATOR after header movement
dt-bindings: thermal: Drop unneeded quotes
thermal/core: Remove thermal_bind_params structure
thermal/drivers/tegra-bpmp: Handle offline zones
thermal/drivers/rockchip: use devm_reset_control_array_get_exclusive()
dt-bindings: rockchip-thermal: Support the RK3588 SoC compatible
thermal/drivers/rockchip: Support RK3588 SoC in the thermal driver
thermal/drivers/rockchip: Support dynamic sized sensor array
thermal/drivers/rockchip: Simplify channel id logic
thermal/drivers/rockchip: Use dev_err_probe
thermal/drivers/rockchip: Simplify clock logic
thermal/drivers/rockchip: Simplify getting match data
...
Linus Torvalds [Wed, 26 Apr 2023 00:43:44 +0000 (17:43 -0700)]
Merge tag 'hwmon-for-v6.4' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
"New drivers
- Driver for Acbel FSB032 power supply
- Driver for StarFive JH71x0 temperature sensor
Added support to existing drivers:
- aquacomputer_d5next: Support for Aquacomputer Aquastream XT
- nct6775: Added various ASUS boards to list of boards supporting WMI
- asus-ec-sensors: ROG STRIX Z390-F GAMING, ProArt B550-Creator,
Notable improvements:
- Regulator event and sysfs notification support for PMBus drivers
Notable cleanup:
- Constified pointers to hwmon_channel_info
.. and various other minor bug fixes and improvements"
* tag 'hwmon-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (131 commits)
hwmon: lochnagar: Remove the unneeded include <linux/i2c.h>
hwmon: (pmbus/fsp-3y) Fix functionality bitmask in FSP-3Y YM-2151E
hwmon: (adt7475) Use device_property APIs when configuring polarity
hwmon: (aquacomputer_d5next) Add support for Aquacomputer Aquastream XT
hwmon: (it87) Disable/enable SMBus access for IT8622E chipset
hwmon: (it87) Add calls to smbus_enable/smbus_disable as required
hwmon: (it87) Test for error in it87_update_device
hwmon: (it87) Disable SMBus access for environmental controller registers.
docs: hwmon: Add documentaion for acbel-fsg032 PSU
hwmon: (pmbus/acbel-fsg032) Add Acbel power supply
dt-bindings: trivial-devices: Add acbel,fsg032
dt-bindings: vendor-prefixes: Add prefix for acbel
hwmon: (sfctemp) Simplify error message
hwmon: (pmbus/ibm-cffps) Use default debugfs attributes and lock function
hwmon: (pmbus/core) Add lock and unlock functions
hwmon: (pmbus/core) Request threaded interrupt with IRQF_ONESHOT
hwmon: (nct6775) update ASUS WMI monitoring list A620/B760/W790
hwmon: ina2xx: add optional regulator support
dt-bindings: hwmon: ina2xx: add supply property
dt-bindings: hwmon: pwm-fan: Convert to DT schema
...
Linus Torvalds [Wed, 26 Apr 2023 00:38:25 +0000 (17:38 -0700)]
Merge tag 'rproc-v6.4' of git://git./linux/kernel/git/remoteproc/linux
Pull remoteproc updates from Bjorn Andersson:
- Unnecessary type casts from the 'void *' rproc->priv pointer are
dropped throughout the subsystem.
- A kernel-doc error is corrected in the Mediatek SCPI IPI
implementation
- The firmware loading onto the IMX DSP remote processors is reworked
to avoid non-32bit memory operations. A module parameter is
introduced to assist development of firmware without communication
abilities in place. Error paths in imx_dsp_rproc_mbox_alloc() is
cleaned up
- The cluster configuration handling in the TI K3 R5 driver is
corrected and support for the single-R5 core found in the TI AM62x
SoC family is introduced
- The TI PRU driver device- to virtual-address translation is updated
to avoid compiler warning about the unsigned device-address always
being larger than 0
- The ST remoteproc driver is transitioned to use of_property_present()
- Issues with kicks arriving after the STM32 remote processor has been
shut down are mitigated by checking the processor's state before
handling them.
- Support for mailbox channels for communication with the remote
processors are added to the Xilinx R5 remoteproc driver. The naming
of carveouts are corrected and their parsing is reworked. For this a
couple of fixes targeting the mailbox subsystem are picked up here as
well.
- Reference counting of of_nodes are corrected in the ST, STM32, RCAR
and IMX remoteproc drivers
* tag 'rproc-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (24 commits)
remoteproc: st: Use of_property_present() for testing DT property presence
dt-bindings: remoteproc: Drop unneeded quotes
remoteproc: imx_dsp_rproc: Fix kernel test robot sparse warning
remoteproc: imx_dsp_rproc: Improve exception handling in imx_dsp_rproc_mbox_alloc()
remoteproc: pru: Remove always true check positive unsigned value
dt-bindings: remoteproc: stm32-rproc: Typo fix
remoteproc: stm32_rproc: Add mutex protection for workqueue
remoteproc: Remove unnecessary (void*) conversions
remoteproc: imx_dsp_rproc: Call of_node_put() on iteration error
remoteproc: imx_rproc: Call of_node_put() on iteration error
remoteproc: rcar_rproc: Call of_node_put() on iteration error
remoteproc: st: Call of_node_put() on iteration error
remoteproc: stm32: Call of_node_put() on iteration error
remoteproc: k3-r5: Use separate compatible string for TI AM62x SoC family
dt-bindings: remoteproc: ti: Add new compatible for AM62 SoC family
remoteproc: k3-r5: Simplify cluster mode setting usage
remoteproc/mtk_scpi_ipi: Fix one kernel-doc comment
remoteproc: xilinx: Add mailbox channels for rpmsg
drivers: remoteproc: xilinx: Fix carveout names
mailbox: zynqmp: Fix typo in IPI documentation
...
Linus Torvalds [Wed, 26 Apr 2023 00:33:55 +0000 (17:33 -0700)]
Merge tag 'rpmsg-v6.4' of git://git./linux/kernel/git/remoteproc/linux
Pull rpmsg updates from Bjorn Andersson:
"The remove functions of the Qualcomm SMD and GLINK RPM platform
drivers are transitioned to the new void returning prototype. Likewise
is qcom_smd_unregister_edge() transitioned to void, as it
unconditionally returned 0.
An assumption about the ordering of the intent request acknowledgement
and advertisement of a new intent in the GLINK implementation is
corrected. Faulty error handling is corrected is improved in the TX
path, and duplicated code, in the same path, is cleaned up"
* tag 'rpmsg-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
rpmsg: glink: Consolidate TX_DATA and TX_DATA_CONT
rpmsg: glink: Propagate TX failures in intentless mode as well
rpmsg: glink: Wait for intent, not just request ack
rpmsg: glink: Transition intent request signaling to wait queue
rpmsg: qcom_smd: Convert to platform remove callback returning void
rpmsg: qcom_glink_rpm: Convert to platform remove callback returning void
rpmsg: qcom_smd: Make qcom_smd_unregister_edge() return void
Linus Torvalds [Wed, 26 Apr 2023 00:29:46 +0000 (17:29 -0700)]
Merge tag 'mmc-v6.4' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Allow an invalid regulator in mmc_regulator_set_ocr()
- Log about empty non-removable slots
- Add helpers to enable/disable the vqmmc regulator
MMC host:
- mtk-sd: Add support for the mt8365 variant
- renesas_sdhi: Remove support for R-Car H3 ES1.* variants
- sdhci_am654: Add power management support
- sdhci-cadence: Add support for eMMC hardware reset
- sdhci-cadence: Add support for AMD Pensando Elba variant
- sdhci-msm: Add support for the IPQ5018 variant
- sdhci-msm: Add support for the QCM2290 variant
- sdhci-of-arasan: Skip setting clock delay for 400KHz
- sdhci-of-arasan: Add support for the Xilinx Versal Net variant
- sdhci-of-arasan: Remove Intel Thunder Bay SOC support
- sdhci-of-arasan: Add support to request the "gate" clock
- sdhci-of-dwcmshc: Properly determine max clock on Rockchip
- sdhci-of-esdhc: Fix quirk to ignore command inhibit for data
- sdhci-pci-o2micro: Fix SDR50 mode timing issue
MEMSTICK:
- r592: Fix use-after-free bug in r592_remove due to race condition"
* tag 'mmc-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (40 commits)
dt-bindings: mmc: sdhci-msm: Document the IPQ5018 compatible
mmc: vub300: remove unreachable code
mmc: sdhci-cadence: Support mmc hardware reset
mmc: sdhci-cadence: Add AMD Pensando Elba SoC support
mmc: sdhci-cadence: Support device specific init during probe
mmc: sdhci-cadence: Enable device specific override of writel()
dt-bindings: mmc: cdns: Add AMD Pensando Elba SoC
mmc: core: Remove unused macro mmc_req_rel_wr
mmc: sdhci-of-arasan: Skip setting clock delay for 400KHz
mmc: sdhci-of-arasan: Add support for eMMC5.1 on Xilinx Versal Net platform
dt-bindings: mmc: arasan,sdci: Add Xilinx Versal Net compatible
mmc: sdhci_am654: Add support for PM suspend/resume
mmc: core: remove unnecessary (void*) conversions
dt-bindings: mmc: fsl-imx-esdhc: ref sdhci-common.yaml
mmc: sdhci-of-esdhc: fix quirk to ignore command inhibit for data
mmc: core: Log about empty non-removable slots
dt-bindings: mmc: fujitsu: Add Socionext Synquacer
mmc: sdricoh_cs: remove unused sdricoh_readw function
dt-bindings: mmc: Remove bindings for Intel Thunder Bay SoC"
mmc: sdhci-of-arasan: Remove Intel Thunder Bay SOC support
...
Linus Torvalds [Wed, 26 Apr 2023 00:23:42 +0000 (17:23 -0700)]
Merge tag 'mtd/for-6.4' of git://git./linux/kernel/git/mtd/linux
Pull mtd updates from Miquel Raynal:
"Core MTD changes:
- dt-bindings: Drop unneeded quotes
- mtdblock: Tolerate corrected bit-flips
- Use of_property_read_bool() for boolean properties
- Avoid magic values
- Avoid printing error messages on probe deferrals
- Prepare mtd_otp_nvmem_add() to handle -EPROBE_DEFER
- Fix error path for nvmem provider
- Fix nvmem error reporting
- Provide unique name for nvmem device
MTD device changes:
- lpddr_cmds: Remove unused words variable
- bcm63xxpart: Remove MODULE_LICENSE in non-modules
SPI NOR core changes:
- Introduce Read While Write support for flashes featuring several
banks
- Set the 4-Byte Address Mode method based on SFDP data
- Allow post_sfdp hook to return errors
- Parse SCCR MC table and introduce support for multi-chip devices
SPI NOR manufacturer drivers changes:
- macronix: Add support for mx25uw51245g with RWW
- spansion:
- Determine current address mode at runtime as it can be changed
in a non-volatile way and differ from factory defaults or from
what SFDP advertises.
- Enable JFFS2 write buffer mode for few ECC'd NOR flashes:
S25FS256T, s25hx and s28hx
- Add support for s25hl02gt and s25hs02gt
Raw NAND core changes:
- Convert to platform remove callback returning void
- Fix spelling mistake waifunc() -> waitfunc()
Raw NAND controller driver changes:
- imx: Remove unused is_imx51_nfc and imx53_nfc functions
- omap2: Drop obsolete dependency on COMPILE_TEST
- orion: Use devm_platform_ioremap_resource()
- qcom:
- Use of_property_present() for testing DT property presence
- Use devm_platform_get_and_ioremap_resource()
- stm32_fmc2: Depends on ARCH_STM32 instead of MACH_STM32MP157
- tmio: Remove reference to config MTD_NAND_TMIO in the parsers
Raw NAND manufacturer driver changes:
- hynix: Fix up bit 0 of sdr_timing_mode
SPI-NAND changes:
- Add support for ESMT F50x1G41LB"
* tag 'mtd/for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (55 commits)
mtd: nand: Convert to platform remove callback returning void
mtd: onenand: omap2: Drop obsolete dependency on COMPILE_TEST
mtd: spi-nor: spansion: Add support for s25hl02gt and s25hs02gt
mtd: spi-nor: spansion: Add a new ->ready() hook for multi-chip device
mtd: spi-nor: spansion: Rework cypress_nor_quad_enable_volatile() for multi-chip device support
mtd: spi-nor: spansion: Rework cypress_nor_get_page_size() for multi-chip device support
mtd: spi-nor: sfdp: Add support for SCCR map for multi-chip device
mtd: spi-nor: Extract volatile register offset from SCCR map
mtd: spi-nor: Allow post_sfdp hook to return errors
mtd: spi-nor: spansion: Rename method to cypress_nor_get_page_size
mtd: spi-nor: spansion: Enable JFFS2 write buffer for S25FS256T
mtd: spi-nor: spansion: Enable JFFS2 write buffer for Infineon s25hx SEMPER flash
mtd: spi-nor: spansion: Enable JFFS2 write buffer for Infineon s28hx SEMPER flash
mtd: spi-nor: spansion: Determine current address mode
mtd: spi-nor: core: Introduce spi_nor_set_4byte_addr_mode()
mtd: spi-nor: core: Update flash's current address mode when changing address mode
mtd: spi-nor: Stop exporting spi_nor_restore()
mtd: spi-nor: Set the 4-Byte Address Mode method based on SFDP data
mtd: spi-nor: core: Make spi_nor_set_4byte_addr_mode_brwr public
mtd: spi-nor: core: Update name and description of spi_nor_set_4byte_addr_mode
...
Linus Torvalds [Wed, 26 Apr 2023 00:18:18 +0000 (17:18 -0700)]
Merge tag 'gpio-updates-for-v6.4' of git://git./linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"We have some new drivers, significant refactoring of existing intel
platforms, lots of improvements all around, mass conversion to using
immutable irqchips by drivers that had not been converted individually
yet and some changes in the core library code.
Summary:
New drivers:
- add a driver for the Loongson GPIO controller
- add a driver for the fxl6408 I2C GPIO expander
- add a GPIO module containing code common for Intel Elkhart Lake and
Merrifield platforms
- add a driver for the Intel Elkhart Lake platform reusing the code
from the intel tangier library
GPIOLIB core:
- GPIO ACPI improvements
- simplify gpiochip_add_data_with_keys() fwnode handling
- cleanup header inclusions (remove unneeded ones, order the rest
alphabetically)
- remove duplicate code (reuse krealloc() instead of open-coding it,
drop a duplicated check in gpiod_find_and_request())
- reshuffle the code to remove unnecessary forward declarations
- coding style cleanups and improvements
- add a helper for accessing device fwnodes
- small updates in docs
Driver improvements:
- convert all remaining GPIO irqchip drivers to using immutable
irqchips
- drop unnecessary of_match_ptr() macro expansions
- shrink the code in gpio-merrifield significantly by reusing the
code from gpio-tangier + minor tweaks to the driver code
- remove MODULE_LICENSE() from drivers that can only be built-in
- add device-tree support to gpio-loongson1
- use new regmap features in gpio-104-dio-48e and gpio-pcie-idio-24
- minor tweaks and fixes to gpio-xra1403, gpio-sim, gpio-tegra194,
gpio-omap, gpio-aspeed, gpio-raspberrypi-exp
- shrink code in gpio-ich and gpio-pxa
- Kconfig tweak for gpio-pmic-eic-sprd"
* tag 'gpio-updates-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (99 commits)
gpio: gpiolib: Simplify gpiochip_add_data_with_key() fwnode
gpiolib: Add gpiochip_set_data() helper
gpiolib: Move gpiochip_get_data() higher in the code
gpiolib: Check array_info for NULL only once in gpiod_get_array()
gpiolib: Replace open coded krealloc()
gpiolib: acpi: Add a ignore wakeup quirk for Clevo NL5xNU
gpiolib: acpi: Move ACPI device NULL check to acpi_get_driver_gpio_data()
gpiolib: acpi: use the fwnode in acpi_gpiochip_find()
gpio: mm-lantiq: Fix typo in the newly added header filename
sh: mach-x3proto: Add missing #include <linux/gpio/driver.h>
powerpc/40x: Add missing select OF_GPIO_MM_GPIOCHIP
gpio: xlp: Convert to immutable irq_chip
gpio: xilinx: Convert to immutable irq_chip
gpio: xgs-iproc: Convert to immutable irq_chip
gpio: visconti: Convert to immutable irq_chip
gpio: tqmx86: Convert to immutable irq_chip
gpio: thunderx: Convert to immutable irq_chip
gpio: stmpe: Convert to immutable irq_chip
gpio: siox: Convert to immutable irq_chip
gpio: rda: Convert to immutable irq_chip
...
Linus Torvalds [Wed, 26 Apr 2023 00:13:47 +0000 (17:13 -0700)]
Merge tag 'regulator-v6.4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"A fairly quiet release, there were some cleanup and a couple of new
devices but the biggest change was converting most of the drivers to
use asynchronous probe. This allows us to ramp up multiple regulators
in parallel during boot which can have a noticable impact on modern
systems.
Summary:
- Update of drivers to PROBE_PREFER_ASYNCHRONOUS to mitigate issues
with ramp times slowing down boots.
- Convert to void remove callbacks.
- Support for voltage monitoring on DA9063
- Support for Qualcomm PMC8180 and PMM8654au, Richtek RT4803 and
RT5739, Rockchip RK860x"
* tag 'regulator-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (46 commits)
regulator: dt-bindings: qcom,rpmh: Combine PM6150L and PM8150L if-then
regulator: core: Make regulator_lock_two() logic easier to follow
regulator: dt-bindings: qcom,rpmh: Correct PM8550 family supplies
regulator: stm32-pwr: fix of_iomap leak
dt-bindings: mfd: dlg,da9063: document voltage monitoring
regulator: da9063: implement setter for voltage monitoring
regulator: da9063: add voltage monitoring registers
regulator: fan53555: Add support for RK860X
regulator: fan53555: Use dev_err_probe
regulator: fan53555: Improve vsel_mask computation
regulator: fan53555: Make use of the bit macros
regulator: fan53555: Remove unused *_SLEW_SHIFT definitions
regulator: dt-bindings: fcs,fan53555: Add support for RK860X
regulator: qcom_smd: Add MP5496 S1 regulator
regulator: qcom_smd: Add s1 sub-node to mp5496 regulator
regulator: qcom,rpmh: add compatible for pmm8654au RPMH
regulator: qcom-rpmh: add support for pmm8654au regulators
regulator: core: Avoid lockdep reports when resolving supplies
regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow()
regulator: dt-bindings: qcom,rpmh: Add compatible for PMC8180
...
Linus Torvalds [Wed, 26 Apr 2023 00:09:34 +0000 (17:09 -0700)]
Merge tag 'regmap-v6.4' of git://git./linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"This is a much bigger change for regmap than is normal, the main
things being the addition of some KUnit coverage and a maple tree
based register cache which longer term is likely to replace the rbtree
cache except possibly for very small register maps.
While it's complete overkill for most applications the code for maple
trees is there and there are some larger, sparser devices where the
data structure is a better fit.
The maple tree support is still a work in progress but already useful,
there's some conversions of drivers ready to go after the merge
window.
Summary:
- Support for shifting register addresses up as well as down, there's
a use cases with memory mapped MDIO.
- Refactoring of the type configuration in regmap-irq to allow access
to driver data in the handler, needed by some GPIO devices.
- Some initial KUnit coverage, the bulk of the driver facing API is
covered but there's holes and things like the data marshalling for
bytestream buses are just not covered in the slightest.
- Removal of the compressed cache type, it had zero users and was
getting in the way of KUnit.
- Addition of a maple tree based register cache, there's more work to
do but it's already useful for some devices with a flatter data
structure than rbtree and getting to use all the optimisation work
Liam is doing"
* tag 'regmap-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: allow upshifting register addresses before performing operations
regmap: Pass irq_drv_data as a parameter for set_type_config()
regmap: Use mas_walk() instead of mas_find()
regmap: Fix double unlock in the maple cache
regmap: Add maple tree based register cache
regmap: Factor out single value register syncing
regmap: Add some basic kunit tests
regmap: Add RAM backed register map
regmap: Removed compressed cache support
regmap: Support paging for buses with reg_read()/reg_write()
regmap: Clarify error for unknown cache types
regmap: Handle sparse caches in the default sync
regmap: add a helper to translate the register address
regmap: cache: Silence checkpatch warning
regmap: cache: Return error in cache sync operations for REGCACHE_NONE
regmap-irq: Place kernel doc of struct regmap_irq_chip in order
regmap-irq: Add no_status support
regmap: sdw: Remove 8-bit value size restriction
regmap: sdw: Update misleading comment
Linus Torvalds [Tue, 25 Apr 2023 23:59:48 +0000 (16:59 -0700)]
Merge tag 'platform-drivers-x86-v6.4-1' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
- AMD PMC and PMF drivers:
- Numerous bugfixes
- Intel Speed Select Technology (ISST):
- TPMI (Topology Aware Register and PM Capsule Interface) support
for ISST support on upcoming processor models
- Various other improvements / new hw support
- tools/intel-speed-select: TPMI support + other improvements
- Intel In Field Scan (IFS):
- Add Array Bist test support
- New drivers:
- intel_bytcrc_pwrsrc Crystal Cove PMIC pwrsrc / reset-reason driver
- lenovo-ymc Yoga Mode Control driver for reporting SW_TABLET_MODE
- msi-ec Driver for MSI laptop EC features like battery charging limits
- apple-gmux:
- Support for new MMIO based models (T2 Macs)
- Honor acpi_backlight= auto-detect-code + kernel cmdline option
to switch between gmux and apple_bl backlight drivers and remove
own custom handling for this
- x86-android-tablets: Refactor / cleanup + new hw support
- Miscellaneous other cleanups / fixes
* tag 'platform-drivers-x86-v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (178 commits)
platform/x86: x86-android-tablets: Add accelerometer support for Yoga Tablet 2 1050/830 series
platform/x86: x86-android-tablets: Add "yogabook-touch-kbd-digitizer-switch" pdev for Lenovo Yoga Book
platform/x86: x86-android-tablets: Add Wacom digitizer info for Lenovo Yoga Book
platform/x86: x86-android-tablets: Update Yoga Book HiDeep touchscreen comment
platform/x86: thinkpad_acpi: Fix Embedded Controller access on X380 Yoga
platform/x86/intel/sdsi: Change mailbox timeout
platform/x86/intel/pmt: Ignore uninitialized entries
platform/x86: amd: pmc: provide user message where s0ix is not supported
platform/x86/amd: pmc: Fix memory leak in amd_pmc_stb_debugfs_open_v2()
mlxbf-bootctl: Add sysfs file for BlueField boot fifo
platform/x86: amd: pmc: Remove __maybe_unused from amd_pmc_suspend_handler()
platform/x86/intel/pmc/mtl: Put GNA/IPU/VPU devices in D3
platform/x86/amd: pmc: Move out of BIOS SMN pair for STB init
platform/x86/amd: pmc: Utilize SMN index 0 for driver probe
platform/x86/amd: pmc: Move idlemask check into `amd_pmc_idlemask_read`
platform/x86/amd: pmc: Don't dump data after resume from s0i3 on picasso
platform/x86/amd: pmc: Hide SMU version and program attributes for Picasso
platform/x86/amd: pmc: Don't try to read SMU version on Picasso
platform/x86/amd/pmf: Move out of BIOS SMN pair for driver probe
platform/x86: intel-uncore-freq: Add client processors
...
Linus Torvalds [Tue, 25 Apr 2023 23:56:53 +0000 (16:56 -0700)]
Merge tag 'tag-chrome-platform-for-v6.4' of git://git./linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Tzung-Bi Shih:
"Improvements:
- Replace fake flexible arrays with flexible-array member
Misc:
- Minor cleanups and fixes"
* tag 'tag-chrome-platform-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: wilco_ec: remove return value check of debugfs_create_dir()
platform/chrome: cros_ec_debugfs: fix kernel-doc warning
platform/chrome: cros_ec: Separate logic for getting panic info
platform/chrome: cros_typec_switch: Add missing fwnode_handle_put()
platform/chrome: cros_ec: remove unneeded label and if-condition
platform/chrome: Replace fake flexible arrays with flexible-array member
Linus Torvalds [Tue, 25 Apr 2023 23:27:13 +0000 (16:27 -0700)]
Merge tag 'media/v6.4-1' of git://git./linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- Removal of some old unused sensor drivers: ad9389b, m5mols, mt9m032,
mt9t001, noon010pc30, s5k6aa, sr030pc30 and vs6624
- New i.MX8 image sensor interface driver
- Some new RC keymaps
- lots of cleanups at atomisp driver to make it support standard
features present on other webcam drivers
- the cx18 and saa7146 now uses VB2
- lots of cleanups and driver improvements
* tag 'media/v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (460 commits)
media: ov5670: Fix probe on ACPI
media: nxp: imx8-isi: Remove 300ms sleep after enabling channel
media: nxp: imx8-isi: Replace udelay() with fsleep()
media: nxp: imx8-isi: Drop partial support for i.MX8QM and i.MX8QXP
media: nxp: Add i.MX8 ISI driver
media: dt-bindings: media: Add i.MX8 ISI DT bindings
media: atomisp: gmin_platform: Add Lenovo Ideapad Miix 310 gmin_vars
media: atomisp: gmin_platform: Make DMI quirks take precedence over the _DSM table
media: atomisp: Remove struct atomisp_sub_device index field
media: atomisp: Drop support for streaming from 2 sensors at once
media: atomisp: Remove atomisp_try_fmt() call from atomisp_set_fmt()
media: atomisp: Remove unused ATOM_ISP_MAX_WIDTH_TMP and ATOM_ISP_MAX_HEIGHT_TMP
media: atomisp: Remove snr_mbus_fmt local var from atomisp_try_fmt()
media: atomisp: Remove custom V4L2_CID_FMT_AUTO control
media: atomisp: Remove continuous mode related code from atomisp_set_fmt()
media: atomisp: Remove duplicate atomisp_[start|stop]_streaming() prototypes
media: atomisp: gc0310: Switch over to ACPI powermanagement
media: atomisp: gc0310: Use devm_kzalloc() for data struct
media: atomisp: gc0310: Add runtime-pm support
media: atomisp: gc0310: Delay power-on till streaming is started
...
Linus Torvalds [Tue, 25 Apr 2023 23:12:15 +0000 (16:12 -0700)]
Merge tag 'drm-next-2023-04-24' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"There is a new Qualcomm accel driver for their QAIC, dma-fence got a
deadline feature added, lots of refactoring around fbdev emulation,
and the usual pre-release hw enablements from AMD and Intel and fixes
everywhere.
New drivers:
- add QAIC acceleration driver
dma-buf:
- constify kobj_type structs
- Reject prime DMA-Buf attachment if get_sg_table is missing.
fbdev:
- cmdline parser fixes
- implement fbdev emulation for GEM DMA drivers
- always use shadow buffer in fbdev emulation helpers
dma-fence:
- add deadline hint to fences
- signal private stub fence
core:
- improve DisplayID 2.0 and EDID parsing
- add gem eviction function + callback
- prep to convert shmem helper to GEM resv lock
- move suballocator from radeon/amdgpu to core for Xe
- HPD polling fixes
- Documentation improvements
- Add atomic enable_plane callback
- use tgid instead of pid for client tracking
- DP: Add SDP Error Detection Configuration Register
- Add prime import/export to vram-helper
- use pci aperture helpers in more drivers
panel:
- Radxa 8/10HD support
- Samsung AMD495QA01 support
- Elida KD50T048A
- Sony TD4353
- Novatek NT36523
- STARRY 2081101QFH032011-53G
- B133UAN01.0
- AUO NE135FBM-N41
i915:
- More MTL enabling
- fix s/r problems with MEI/PXP
- Implement fb_dirty for PSR,FBC,DRRS fixes
- Fix eDP+DSI dual panel systems
- Fix issue #6333: "list_add corruption" and full system lockup from
performance monitoring
- Don't use stolen memory or BAR for ring buffers on LLC platforms
- Make sure DSM size has correct 1MiB granularity on Gen12+
- Whitelist COMMON_SLICE_CHICKEN3 for UMD access on Gen12+
- Add engine TLB invalidation for Meteorlake
- Fix GSC races on driver load/unload on Meteorlake+
- Make kobj_type structures constant
- Move fd_install after last use of fence
- wm/vblank refactoring
- display code refactoring
- Create GSC submission targeting HDCP and PXP usages on MTL+
- Enable HDCP2.x via GSC CS
- Fix context runtime accounting on sysfs fdinfo for heavy workloads
- Use i915 instead of dev_priv insied the file_priv structure
- Replace fake flex-array with flexible-array member
amdgpu:
- Make kobj structures const
- Generalize dmabuf import to work with KFD
- Add capped/uncapped workload handling for supported APUs
- Expose additional memory stats via fdinfo
- Register vga_switcheroo for apple-gmux
- Initial NBIO7.9, GC 9.4.3, GFXHUB 1.2, MMHUB 1.8 support
- Initial DC FAM infrastructure
- Link DC backlight to connector device rather than PCI device
- Add sysfs nodes for secondary VCN clocks
amdkfd:
- Make kobj structures const
- Support for exporting buffers via dmabuf
- Multi-VMA page migration fixes
- initial GC 9.4.3 support
radeon:
- iMac fix
- convert to client based fbdev emulation
habanalabs:
- Add opcodes to the CS ioctl to allow user to stall/resume specific
engines inside Gaudi2.
- INFO ioctl the amount of device memory that the driver and f/w
reserve for themselves.
- INFO ioctl a bit-mask of the available rotator engines
- INFO ioctl the register's address of the f/w that should be used to
trigger interrupts
- INFO ioctl two new opcodes to fetch information on h/w and f/w
events
- Enable graceful reset mechanism for compute-reset.
- Align to the latest firmware specs.
- Enforce the release order of the compute device and dma-buf.
msm:
- UBWC decoder programming rework
- SM8550, SM8450 bindings update
- uapi C++ fix
- a3xx and a4xx devfreq support
- GPU and GEM updates to avoid allocations which could trigger
reclaim (shrinker) in fence signaling path
- dma-fence deadline hint support and wait-boost
- a640/650 speed bin support
cirrus:
- convert to regular atomic helpers
- add damage clipping
mediatek:
- 10-bit overlay support
- mt8195 support
- Only trigger DRM HPD events if bridge is attached
- Change the aux retries times when receiving AUX_DEFER
rockchip:
- add 4K support
vc4:
- use drm_gem_objects
virtio:
- allow KMS support to be disabled
- add damage clipping
vmwgfx:
- buffer object lifetime fixes
exynos:
- move MIPI DSI driver to drm bridge for iMX sharing
- use kernel fbdev emulation
panfrost:
- add support for mali MT81xx devices
- add speed binning support
lima:
- add usage stats
tegra:
- fbdev client conversion
vkms:
- Add primary plane positioning support"
* tag 'drm-next-2023-04-24' of git://anongit.freedesktop.org/drm/drm: (1495 commits)
drm/i915/dp_mst: Fix active port PLL selection for secondary MST streams
drm/exynos: Implement fbdev emulation as in-kernel client
drm/exynos: Initialize fbdev DRM client
drm/exynos: Remove fb_helper from struct exynos_drm_private
drm/exynos: Remove struct exynos_drm_fbdev
drm/exynos: Remove exynos_gem from struct exynos_drm_fbdev
drm/i915: Fix memory leaks in i915 selftests
drm/i915: Make intel_get_crtc_new_encoder() less oopsy
drm/i915/gt: Avoid out-of-bounds access when loading HuC
drm/amdgpu: add some basic elements for multiple XCD case
drm/amdgpu: move vmhub out of amdgpu_ring_funcs (v4)
Revert "drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV"
drm/amdgpu: add common ip block for GC 9.4.3
drm/amd/display: Add logging when DP link training Clock recovery is Successful
drm/amdgpu: add common early init support for GC 9.4.3
drm/amdgpu: switch to v9_4_3 gfx_funcs callbacks for GC 9.4.3
drm/amd/display: Add logging when setting DP sink power state fails
drm/amdkfd: Add gfx_target_version for GC 9.4.3
drm/amdkfd: Enable HW_UPDATE_RPTR on GC 9.4.3
drm/amdgpu: reserve the old gc_11_0_*_mes.bin
...
Linus Torvalds [Tue, 25 Apr 2023 20:00:41 +0000 (13:00 -0700)]
Merge tag 'slab-for-6.4' of git://git./linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
"The main change is naturally the SLOB removal. Since its deprecation
in 6.2 I've seen no complaints so hopefully SLUB_(TINY) works well for
everyone and we can proceed.
Besides the code cleanup, the main immediate benefit will be allowing
kfree() family of function to work on kmem_cache_alloc() objects,
which was incompatible with SLOB. This includes kfree_rcu() which had
no kmem_cache_free_rcu() counterpart yet and now it shouldn't be
necessary anymore.
Besides that, there are several small code and comment improvements
from Thomas, Thorsten and Vernon"
* tag 'slab-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: document kfree() as allowed for kmem_cache_alloc() objects
mm/slob: remove slob.c
mm/slab: remove CONFIG_SLOB code from slab common code
mm, pagemap: remove SLOB and SLQB from comments and documentation
mm, page_flags: remove PG_slob_free
mm/slob: remove CONFIG_SLOB
mm/slub: fix help comment of SLUB_DEBUG
mm: slub: make kobj_type structure constant
slab: Adjust comment after refactoring of gfp.h
Linus Torvalds [Tue, 25 Apr 2023 19:51:51 +0000 (12:51 -0700)]
Merge tag 'livepatching-for-6.4' of git://git./linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Code and documentation cleanup
* tag 'livepatching-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
livepatch: Make kobj_type structures constant
livepatch: fix ELF typos
Linus Torvalds [Tue, 25 Apr 2023 19:46:48 +0000 (12:46 -0700)]
Merge tag 'printk-for-6.4' of git://git./linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Code cleanup and dead code removal
* tag 'printk-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Remove obsoleted check for non-existent "user" object
lib/vsprintf: Use isodigit() for the octal number check
Remove orphaned CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT
Linus Torvalds [Tue, 25 Apr 2023 19:39:01 +0000 (12:39 -0700)]
Merge tag 'arm64-upstream' of git://git./linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"ACPI:
- Improve error reporting when failing to manage SDEI on AGDI device
removal
Assembly routines:
- Improve register constraints so that the compiler can make use of
the zero register instead of moving an immediate #0 into a GPR
- Allow the compiler to allocate the registers used for CAS
instructions
CPU features and system registers:
- Cleanups to the way in which CPU features are identified from the
ID register fields
- Extend system register definition generation to handle Enum types
when defining shared register fields
- Generate definitions for new _EL2 registers and add new fields for
ID_AA64PFR1_EL1
- Allow SVE to be disabled separately from SME on the kernel
command-line
Tracing:
- Support for "direct calls" in ftrace, which enables BPF tracing for
arm64
Kdump:
- Don't bother unmapping the crashkernel from the linear mapping,
which then allows us to use huge (block) mappings and reduce TLB
pressure when a crashkernel is loaded.
Memory management:
- Try again to remove data cache invalidation from the coherent DMA
allocation path
- Simplify the fixmap code by mapping at page granularity
- Allow the kfence pool to be allocated early, preventing the rest of
the linear mapping from being forced to page granularity
Perf and PMU:
- Move CPU PMU code out to drivers/perf/ where it can be reused by
the 32-bit ARM architecture when running on ARMv8 CPUs
- Fix race between CPU PMU probing and pKVM host de-privilege
- Add support for Apple M2 CPU PMU
- Adjust the generic PERF_COUNT_HW_BRANCH_INSTRUCTIONS event
dynamically, depending on what the CPU actually supports
- Minor fixes and cleanups to system PMU drivers
Stack tracing:
- Use the XPACLRI instruction to strip PAC from pointers, rather than
rolling our own function in C
- Remove redundant PAC removal for toolchains that handle this in
their builtins
- Make backtracing more resilient in the face of instrumentation
Miscellaneous:
- Fix single-step with KGDB
- Remove harmless warning when 'nokaslr' is passed on the kernel
command-line
- Minor fixes and cleanups across the board"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits)
KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege
arm64: kexec: include reboot.h
arm64: delete dead code in this_cpu_set_vectors()
arm64/cpufeature: Use helper macro to specify ID register for capabilites
drivers/perf: hisi: add NULL check for name
drivers/perf: hisi: Remove redundant initialized of pmu->name
arm64/cpufeature: Consistently use symbolic constants for min_field_value
arm64/cpufeature: Pull out helper for CPUID register definitions
arm64/sysreg: Convert HFGITR_EL2 to automatic generation
ACPI: AGDI: Improve error reporting for problems during .remove()
arm64: kernel: Fix kernel warning when nokaslr is passed to commandline
perf/arm-cmn: Fix port detection for CMN-700
arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
arm64: move PAC masks to <asm/pointer_auth.h>
arm64: use XPACLRI to strip PAC
arm64: avoid redundant PAC stripping in __builtin_return_address()
arm64/sme: Fix some comments of ARM SME
arm64/signal: Alloc tpidr2 sigframe after checking system_supports_tpidr2()
arm64/signal: Use system_supports_tpidr2() to check TPIDR2
arm64/idreg: Don't disable SME when disabling SVE
...
Linus Torvalds [Tue, 25 Apr 2023 19:22:11 +0000 (12:22 -0700)]
Merge tag 'asm-generic-6.4' of git://git./linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are various cleanups, fixing a number of uapi header files to no
longer reference CONFIG_* symbols, and one patch that introduces the
new CONFIG_HAS_IOPORT symbol for architectures that provide working
inb()/outb() macros, as a preparation for adding driver dependencies
on those in the following release"
* tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
Kconfig: introduce HAS_IOPORT option and select it as necessary
scripts: Update the CONFIG_* ignore list in headers_install.sh
pktcdvd: Remove CONFIG_CDROM_PKTCDVD_WCACHE from uapi header
Move bp_type_idx to include/linux/hw_breakpoint.h
Move ep_take_care_of_epollwakeup() to fs/eventpoll.c
Move COMPAT_ATM_ADDPARTY to net/atm/svc.c
Linus Torvalds [Tue, 25 Apr 2023 19:11:54 +0000 (12:11 -0700)]
Merge tag 'soc-dt-6.4' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC devicetree updates from Arnd Bergmann:
"The devicetree changes overall are again dominated by the Qualcomm
Snapdragon platform that weighs in at over 300 changesets, but there
are many updates across other platforms as well, notably Mediatek,
NXP, Rockchips, Renesas, TI, Samsung and ST Microelectronics. These
all add new features for existing machines, as well as new machines
and SoCs.
The newly added SoCs are:
- Allwinner T113-s, an Cortex-A7 based variant of the RISC-V based D1
chip.
- StarFive JH7110, a RISC-V SoC based on the Sifive U74 core like its
JH7100 predecessor, but with additional CPU cores and a GPU.
- Apple M2 as used in current Macbook Air/Pro and Mac Mini gets
added, with comparable support as its M1 predecessor.
- Unisoc UMS512 (Tiger T610) is a midrange smartphone SoC
- Qualcomm IPQ5332 and IPQ9574 are Wi-Fi 7 networking SoCs, based on
the Cortex-A53 and Cortex-A73 cores, respectively.
- Qualcomm sa8775p is an automotive SoC derived from the Snapdragon
family.
Including the initial board support for the added SoC platforms, there
are 52 new machines. The largest group are 19 boards industrial
embedded boards based on the NXP i.MX6 (32-bit) and i.MX8 (64-bit)
families.
Others include:
- Two boards based on the Allwinner f1c200s ultra-low-cost chip
- Three 'Banana Pi' variants based on the Amlogic g12b (A311D, S922X)
SoC.
- The Gl.Inet mv1000 router based on Marvell Armada 3720
- A Wifi/LTE Dongle based on Qualcomm msm8916
- Two robotics boards based on Qualcomm QRB chips
- Three Snapdragon based phones made by Xiaomi
- Five developments boards based on various Rockchip SoCs, including
the rk3588s-khadas-edge2 and a few NanoPi models
- The AM625 Beagleplay industrial SBC
Another 14 machines get removed: both boards for the obsolete 'oxnas'
platform, three boards for the Renesas r8a77950 SoC that were only for
pre-production chips, and various chromebook models based on the
Qualcomm Sc7180 'trogdor' design that were never part of products"
* tag 'soc-dt-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (836 commits)
arm64: dts: rockchip: Add support for volume keys to rk3399-pinephone-pro
arm64: dts: rockchip: Add vdd_cpu_big regulators to rk3588-rock-5b
arm64: dts: rockchip: Use generic name for es8316 on Pinebook Pro and Rock 5B
arm64: dts: rockchip: Drop RTC clock-frequency on rk3588-rock-5b
arm64: dts: apple: t8112: Add PWM controller
arm64: dts: apple: t600x: Add PWM controller
arm64: dts: apple: t8103: Add PWM controller
arm64: dts: rockchip: Add pinctrl gpio-ranges for rk356x
ARM: dts: nomadik: Replace deprecated spi-gpio properties
ARM: dts: aspeed-g6: Add UDMA node
ARM: dts: aspeed: greatlakes: add mctp device
ARM: dts: aspeed: greatlakes: Add gpio names
ARM: dts: aspeed: p10bmc: Change power supply info
arm64: dts: mediatek: mt6795-xperia-m5: Add Bosch BMM050 Magnetometer
arm64: dts: mediatek: mt6795-xperia-m5: Add Bosch BMA255 Accelerometer
arm64: dts: mediatek: mt6795: Add tertiary PWM node
arm64: dts: rockchip: add panel to Anbernic RG353 series
dt-bindings: arm: Add Data Modul i.MX8M Plus eDM SBC
dt-bindings: arm: fsl: Add chargebyte Tarragon
dt-bindings: vendor-prefixes: add chargebyte
...
Linus Torvalds [Tue, 25 Apr 2023 19:09:54 +0000 (12:09 -0700)]
Merge tag 'soc-defconfig-6.4' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC defconfig updates from Arnd Bergmann:
"Most of the changes just enable additional device drivers that were
added or that are often used on major platforms.
The virtconfig added last time now disables additional drivers to
shrink kernels for virtual machines.
The obsolete oxnas_v6_defconfig file is removed in turn"
* tag 'soc-defconfig-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (33 commits)
ARM: config: Update Vexpress defconfig
arm64: defconfig: enable building the nvmem-reboot-mode module
arm64: defconfig: Enable TI ADC driver
arm64: defconfig: Enable TI TSCADC driver
arm64: defconfig: Enable security accelerator driver for TI K3 SoCs
arm64: defconfig: Enable crypto test module
ARM: multi_v7_defconfig: Add OPTEE support
ARM: configs: Update U8500 defconfig
ARM: imx_v4_v5_defconfig: Build CONFIG_IMX_SDMA as module
arm64: defconfig: Enable IPQ9574 SoC base configs
ARM: imx_v6_v7_defconfig: Enable Tarragon peripheral drivers
arm64: defconfig: Enable ARM CoreSight PMU driver
arm64: defconfig: remove duplicate TYPEC_UCSI & QCOM_PMIC_GLINK
ARM: configs: remove oxnas_v6_defconfig
arm64: defconfig: Enable audio drivers for AM62-SK
arm64: defconfig: Enable drivers for BeaglePlay
ARM: imx_v6_v7_defconfig: Select CONFIG_DRM_I2C_NXP_TDA998X
arm64: defconfig: Enable Virtio RNG driver as built in
arm64: defconfig: Enable CAN PHY transceiver driver
arm64: defconfig: add PMIC GLINK modules
...
Linus Torvalds [Tue, 25 Apr 2023 19:02:16 +0000 (12:02 -0700)]
Merge tag 'soc-drivers-6.4' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
"The most notable updates this time are for Qualcomm Snapdragon
platforms. The Inline-Crypto-Engine gets a new DT binding and driver,
and a number of drivers now support additional Snapdragon variants, in
particular the rsc, scm, geni, bwm, glink and socinfo, while the llcc
(edac) and rpm drivers get notable functionality updates.
Updates on other platforms include:
- Various updates to the Mediatek mutex and mmsys drivers, including
support for the Helio X10 SoC
- Support for unidirectional mailbox channels in Arm SCMI firmware
- Support for per cpu asynchronous notification in OP-TEE firmware
- Minor updates for memory controller drivers.
- Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra,
Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX
SoC drivers, mainly updating the use of MODULE_LICENSE() macros and
obsolete DT driver interfaces"
* tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
soc: ti: smartreflex: Simplify getting the opam_sr pointer
bus: vexpress-config: Add explicit of_platform.h include
soc: mediatek: Kconfig: Add MTK_CMDQ dependency to MTK_MMSYS
memory: mtk-smi: mt8365: Add SMI Support
dt-bindings: memory-controllers: mediatek,smi-larb: add mt8365
dt-bindings: memory-controllers: mediatek,smi-common: add mt8365
memory: tegra: read values from correct device
dt-bindings: crypto: Add Qualcomm Inline Crypto Engine
soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver
dt-bindings: firmware: document Qualcomm QCM2290 SCM
soc: qcom: rpmh-rsc: Support RSC v3 minor versions
soc: qcom: smd-rpm: Use GFP_ATOMIC in write path
soc/tegra: fuse: Remove nvmem root only access
soc/tegra: cbb: tegra194: Use of_address_count() helper
soc/tegra: cbb: Remove MODULE_LICENSE in non-modules
ARM: tegra: Remove MODULE_LICENSE in non-modules
soc/tegra: flowctrl: Use devm_platform_get_and_ioremap_resource()
soc: tegra: cbb: Drop empty platform remove function
firmware: arm_scmi: Add support for unidirectional mailbox channels
dt-bindings: firmware: arm,scmi: Support mailboxes unidirectional channels
...
Linus Torvalds [Tue, 25 Apr 2023 18:53:09 +0000 (11:53 -0700)]
Merge tag 'soc-arm-6.4' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC updates from Arnd Bergmann:
"The Oxford Semiconductor OX810/OX820 'Oxnas' platform gets retired
after the ARM11MPcore processor keeps causing problems in certain
corner cases. OX820 was the only remaining SoC with this core after
CNS3xxx got retired, and its driver support was never completely
merged upstream. The Arm 'Realview' reference platform still supports
ARM11MPCore in principle, but this was never a product, and the CPU
support will get cleaned up later on.
Another series updates the mv78xx0 platform, which has been similarly
neglected for a while, but should work properly again now.
The other changes are minor cleanups across platforms, mostly
converting code to more modern interfaces for DT nodes and removing
some more code as a follow-up to the large-scale platform removal in
linux-6.3"
* tag 'soc-arm-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits)
ARM: mv78xx0: fix entries for gpios, buttons and usb ports
ARM: mv78xx0: add code to enable XOR and CRYPTO engines on mv78xx0
ARM: mv78xx0: set the correct driver for the i2c RTC
ARM: mv78xx0: adjust init logic for ts-wxl to reflect single core dev
soc: fsl: Use of_property_present() for testing DT property presence
ARM: pxa: Use of_property_read_bool() for boolean properties
firmware: turris-mox-rwtm: make kobj_type structure constant
ARM: oxnas: remove OXNAS support
ARM: sh-mobile: Use of_cpu_node_to_id() to read CPU node 'reg'
ARM: OMAP2+: hwmod: Use kzalloc for allocating only one element
ARM: OMAP2+: Remove the unneeded result variable
ARM: OMAP2+: fix repeated words in comments
ARM: OMAP2+: remove obsolete config OMAP3_SDRC_AC_TIMING
ARM: OMAP2+: Use of_address_to_resource()
ARM: OMAP2+: Use of_property_read_bool() for boolean properties
ARM: omap1: remove redundant variables err
ARM: omap1: Kconfig: Fix indentation
ARM: bcm: Use of_address_to_resource()
ARM: mstar: remove unused config MACH_MERCURY
ARM: spear: remove obsolete config MACH_SPEAR600
...
Linus Torvalds [Tue, 25 Apr 2023 18:39:45 +0000 (11:39 -0700)]
Merge tag 'x86-apic-2023-04-24' of git://git./linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:
- Fix the incorrect handling of atomic offset updates in
reserve_eilvt_offset()
The check for the return value of atomic_cmpxchg() is not compared
against the old value, it is compared against the new value, which
makes it two round on success.
Convert it to atomic_try_cmpxchg() which does the right thing.
- Handle IO/APIC less systems correctly
When IO/APIC is not advertised by ACPI then the computation of the
lower bound for dynamically allocated interrupts like MSI goes wrong.
This lower bound is used to exclude the IO/APIC legacy GSI space as
that must stay reserved for the legacy interrupts.
In case that the system, e.g. VM, does not advertise an IO/APIC the
lower bound stays at 0.
0 is an invalid interrupt number except for the legacy timer
interrupt on x86. The return value is unchecked in the core code, so
it ends up to allocate interrupt number 0 which is subsequently
considered to be invalid by the caller, e.g. the MSI allocation code.
A similar problem was already cured for device tree based systems
years ago, but that missed - or did not envision - the zero IO/APIC
case.
Consolidate the zero check and return the provided "from" argument to
the core code call site, which is guaranteed to be greater than 0.
- Simplify the X2APIC cluster CPU mask logic for CPU hotplug
Per cluster CPU masks are required for X2APIC in cluster mode to
determine the correct cluster for a target CPU when calculating the
destination for IPIs
These masks are established when CPUs are borught up. The first CPU
in a cluster must allocate a new cluster CPU mask. As this happens
during the early startup of a CPU, where memory allocations cannot be
done, the mask has to be allocated by the control CPU.
The current implementation allocates a clustermask just in case and
if the to be brought up CPU is the first in a cluster the CPU takes
over this allocation from a global pointer.
This works nicely in the fully serialized CPU bringup scenario which
is used today, but would fail completely for parallel bringup of
CPUs.
The cluster association of a CPU can be computed from the APIC ID
which is enumerated by ACPI/MADT.
So the cluster CPU masks can be preallocated and associated upfront
and the upcoming CPUs just need to set their corresponding bit.
Aside of preparing for parallel bringup this is a valuable
simplification on its own.
- Remove global variables which control the early startup of secondary
CPUs on 64-bit
The only information which is needed by a starting CPU is the Linux
CPU number. The CPU number allows it to retrieve the rest of the
required data from already existing per CPU storage.
So instead of initial_stack, early_gdt_desciptor and initial_gs
provide a new variable smpboot_control which contains the Linux CPU
number for now. The starting CPU can retrieve and compute all
required information for startup from there.
Aside of being a cleanup, this is also preparing for parallel CPU
bringup, where starting CPUs will look up their Linux CPU number via
the APIC ID, when smpboot_control has the corresponding control bit
set.
- Make cc_vendor globally accesible
Subsequent parallel bringup changes require access to cc_vendor
because confidental computing platforms need special treatment in the
early startup phase vs. CPUID and APCI ID readouts.
The change makes cc_vendor global and provides stub accessors in case
that CONFIG_ARCH_HAS_CC_PLATFORM is not set.
This was merged from the x86/cc branch in anticipation of further
parallel bringup commits which require access to cc_vendor. Due to
late discoveries of fundamental issue with those patches these
commits never happened.
The merge commit is unfortunately in the middle of the APIC commits
so unraveling it would have required a rebase or revert. As the
parallel bringup seems to be well on its way for 6.5 this would be
just pointless churn. As the commit does not contain any functional
change it's not a risk to keep it.
* tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()
x86/apic: Fix atomic update of offset in reserve_eilvt_offset()
x86/coco: Export cc_vendor
x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
x86/smpboot: Remove initial_gs
x86/smpboot: Remove early_gdt_descr on 64-bit
x86/smpboot: Remove initial_stack on 64-bit
x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
Linus Torvalds [Tue, 25 Apr 2023 18:22:46 +0000 (11:22 -0700)]
Merge tag 'timers-core-2023-04-24' of git://git./linux/kernel/git/tip/tip
Pull timers and timekeeping updates from Thomas Gleixner:
- Improve the VDSO build time checks to cover all dynamic relocations
VDSO does not allow dynamic relocations, but the build time check is
incomplete and fragile.
It's based on architectures specifying the relocation types to search
for and does not handle R_*_NONE relocation entries correctly.
R_*_NONE relocations are injected by some GNU ld variants if they
fail to determine the exact .rel[a]/dyn_size to cover trailing zeros.
R_*_NONE relocations must be ignored by dynamic loaders, so they
should be ignored in the build time check too.
Remove the architecture specific relocation types to check for and
validate strictly that no other relocations than R_*_NONE end up in
the VSDO .so file.
- Prefer signal delivery to the current thread for
CLOCK_PROCESS_CPUTIME_ID based posix-timers
Such timers prefer to deliver the signal to the main thread of a
process even if the context in which the timer expires is the current
task. This has the downside that it might wake up an idle thread.
As there is no requirement or guarantee that the signal has to be
delivered to the main thread, avoid this by preferring the current
task if it is part of the thread group which shares sighand.
This not only avoids waking idle threads, it also distributes the
signal delivery in case of multiple timers firing in the context of
different threads close to each other better.
- Align the tick period properly (again)
For a long time the tick was starting at CLOCK_MONOTONIC zero, which
allowed users space applications to either align with the tick or to
place a periodic computation so that it does not interfere with the
tick. The alignement of the tick period was more by chance than by
intention as the tick is set up before a high resolution clocksource
is installed, i.e. timekeeping is still tick based and the tick
period advances from there.
The early enablement of sched_clock() broke this alignement as the
time accumulated by sched_clock() is taken into account when
timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is
not longer a multiple of tick periods, which breaks applications
which relied on that behaviour.
Cure this by aligning the tick starting point to the next multiple of
tick periods, i.e 1000ms/CONFIG_HZ.
- A set of NOHZ fixes and enhancements:
* Cure the concurrent writer race for idle and IO sleeptime
statistics
The statitic values which are exposed via /proc/stat are updated
from the CPU local idle exit and remotely by cpufreq, but that
happens without any form of serialization. As a consequence
sleeptimes can be accounted twice or worse.
Prevent this by restricting the accumulation writeback to the CPU
local idle exit and let the remote access compute the accumulated
value.
* Protect idle/iowait sleep time with a sequence count
Reading idle/iowait sleep time, e.g. from /proc/stat, can race
with idle exit updates. As a consequence the readout may result
in random and potentially going backwards values.
Protect this by a sequence count, which fixes the idle time
statistics issue, but cannot fix the iowait time problem because
iowait time accounting races with remote wake ups decrementing
the remote runqueues nr_iowait counter. The latter is impossible
to fix, so the only way to deal with that is to document it
properly and to remove the assertion in the selftest which
triggers occasionally due to that.
* Restructure struct tick_sched for better cache layout
* Some small cleanups and a better cache layout for struct
tick_sched
- Implement the missing timer_wait_running() callback for POSIX CPU
timers
For unknown reason the introduction of the timer_wait_running()
callback missed to fixup posix CPU timers, which went unnoticed for
almost four years.
While initially only targeted to prevent livelocks between a timer
deletion and the timer expiry function on PREEMPT_RT enabled kernels,
it turned out that fixing this for mainline is not as trivial as just
implementing a stub similar to the hrtimer/timer callbacks.
The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled
systems there is a livelock issue independent of RT.
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU
timers out from hard interrupt context to task work, which is handled
before returning to user space or to a VM. The expiry mechanism moves
the expired timers to a stack local list head with sighand lock held.
Once sighand is dropped the task can be preempted and a task which
wants to delete a timer will spin-wait until the expiry task is
scheduled back in. In the worst case this will end up in a livelock
when the preempting task and the expiry task are pinned on the same
CPU.
The timer wheel has a timer_wait_running() mechanism for RT, which
uses a per CPU timer-base expiry lock which is held by the expiry
code and the task waiting for the timer function to complete blocks
on that lock.
This does not work in the same way for posix CPU timers as there is
no timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry
lock can be used too in a slightly different way.
Add a per task mutex to struct posix_cputimers_work, let the expiry
task hold it accross the expiry function and let the deleting task
which waits for the expiry to complete block on the mutex.
In the non-contended case this results in an extra
mutex_lock()/unlock() pair on both sides.
This avoids spin-waiting on a task which is scheduled out, prevents
the livelock and cures the problem for RT and !RT systems
* tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Implement the missing timer_wait_running callback
selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
selftests/proc: Remove idle time monotonicity assertions
MAINTAINERS: Remove stale email address
timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
timers/nohz: Add a comment about broken iowait counter update race
timers/nohz: Protect idle/iowait sleep time under seqcount
timers/nohz: Only ever update sleeptime from idle exit
timers/nohz: Restructure and reshuffle struct tick_sched
tick/common: Align tick period with the HZ tick.
selftests/timers/posix_timers: Test delivery of signals across threads
posix-timers: Prefer delivery of signals to the current thread
vdso: Improve cmd_vdso_check to check all dynamic relocations
Linus Torvalds [Tue, 25 Apr 2023 18:16:08 +0000 (11:16 -0700)]
Merge tag 'irq-core-2023-04-24' of git://git./linux/kernel/git/tip/tip
Pull interrupt updates from Thomas Gleixner:
"Core:
- Add tracepoints for tasklet callbacks which makes it possible to
analyze individual tasklet functions instead of guess working from
the overall duration of tasklet processing
- Ensure that secondary interrupt threads have their affinity
adjusted correctly
Drivers:
- A large rework of the RISC-V IPI management to prepare for a new
RISC-V interrupt architecture
- Small fixes and enhancements all over the place
- Removal of support for various obsolete hardware platforms and the
related code"
* tag 'irq-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
irqchip/st: Remove stih415/stih416 and stid127 platforms support
irqchip/gic-v3: Add Rockchip 3588001 erratum workaround
genirq: Update affinity of secondary threads
softirq: Add trace points for tasklet entry/exit
irqchip/loongson-pch-pic: Fix pch_pic_acpi_init calling
irqchip/loongson-pch-pic: Fix registration of syscore_ops
irqchip/loongson-eiointc: Fix registration of syscore_ops
irqchip/loongson-eiointc: Fix incorrect use of acpi_get_vec_parent
irqchip/loongson-eiointc: Fix returned value on parsing MADT
irqchip/riscv-intc: Add empty irq_eoi() for chained irq handlers
RISC-V: Use IPIs for remote icache flush when possible
RISC-V: Use IPIs for remote TLB flush when possible
RISC-V: Allow marking IPIs as suitable for remote FENCEs
RISC-V: Treat IPIs as normal Linux IRQs
irqchip/riscv-intc: Allow drivers to directly discover INTC hwnode
RISC-V: Clear SIP bit only when using SBI IPI operations
irqchip/irq-sifive-plic: Add syscore callbacks for hibernation
irqchip: Use of_property_read_bool() for boolean properties
irqchip/bcm-6345-l1: Request memory region
irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4
...
Linus Torvalds [Tue, 25 Apr 2023 18:05:04 +0000 (11:05 -0700)]
Merge tag 'core-entry-2023-04-24' of git://git./linux/kernel/git/tip/tip
Pull core entry/ptrace update from Thomas Gleixner:
"Provide a ptrace set/get interface for syscall user dispatch. The main
purpose is to enable checkpoint/restore (CRIU) to handle processes
which utilize syscall user dispatch correctly"
* tag 'core-entry-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftest, ptrace: Add selftest for syscall user dispatch config api
ptrace: Provide set/get interface for syscall user dispatch
syscall_user_dispatch: Untag selector address before access_ok()
syscall_user_dispatch: Split up set_syscall_user_dispatch()
Linus Torvalds [Tue, 25 Apr 2023 18:00:45 +0000 (11:00 -0700)]
Merge tag 'core-debugobjects-2023-04-24' of git://git./linux/kernel/git/tip/tip
Pull core debugobjects update from Thomas Gleixner:
"A single update to debugobjects:
Prevent a race vs statically initialized objects. Such objects are
usually not initialized via an init() function. They are special cased
and detected on first use under the assumption that they are already
correctly initialized via the static initializer.
This works correctly unless there are two concurrent debug object
operations on such an object.
The first one detects that the object is not yet tracked and tries to
establish a tracking object after dropping the debug objects hash
bucket lock. The concurrent operation does the same. The one which
wins the race ends up modifying the state of the object which makes
the other one fail resulting in a bogus debug objects warning.
Prevent this by making the detection of a static object and the
allocation of a tracking object atomic under the hash bucket lock. So
the first one to acquire the hash bucket lock will succeed and the
second one will observe the correct tracking state.
This race existed forever but was only exposed when the timer wheel
code added a debug_object_assert_init() call outside of the timer base
locked region. This replaced the previous warning about
timer::function being NULL which had to be removed when the
timer_shutdown() mechanics were added"
* tag 'core-debugobjects-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobject: Prevent init race with static objects