Alexei Starovoitov [Wed, 10 Aug 2022 17:12:48 +0000 (10:12 -0700)]
Merge branch 'fixes for bpf map iterator'
Hou Tao says:
====================
From: Hou Tao <houtao1@huawei.com>
Hi,
The patchset constitues three fixes for bpf map iterator:
(1) patch 1~4: fix user-after-free during reading map iterator fd
It is possible when both the corresponding link fd and map fd are
closed bfore reading the iterator fd. I had squashed these four patches
into one, but it was not friendly for stable backport, so I break these
fixes into four single patches in the end. Patch 7 is its testing patch.
(2) patch 5: fix invalidity check for values in sk local storage map
Patch 8 adds two tests for it.
(3) patch 6: reject sleepable program for non-resched map iterator
Patch 9 add a test for it.
Please check the individual patches for more details. And comments are
always welcome.
Regards,
Tao
Changes since v2:
* patch 1~6: update commit messages (from Yonghong & Martin)
* patch 7: add more detailed comments (from Yonghong)
* patch 8: use NULL directly instead of (void *)0
v1: https://lore.kernel.org/bpf/
20220806074019.2756957-1-houtao@huaweicloud.com
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao [Wed, 10 Aug 2022 08:05:38 +0000 (16:05 +0800)]
selftests/bpf: Ensure sleepable program is rejected by hash map iter
Add a test to ensure sleepable program is rejected by hash map iterator.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-10-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao [Wed, 10 Aug 2022 08:05:37 +0000 (16:05 +0800)]
selftests/bpf: Add write tests for sk local storage map iterator
Add test to validate the overwrite of sock local storage map value in
map iterator and another one to ensure out-of-bound value writing is
rejected.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-9-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao [Wed, 10 Aug 2022 08:05:36 +0000 (16:05 +0800)]
selftests/bpf: Add tests for reading a dangling map iter fd
After closing both related link fd and map fd, reading the map
iterator fd to ensure it is OK to do so.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao [Wed, 10 Aug 2022 08:05:35 +0000 (16:05 +0800)]
bpf: Only allow sleepable program for resched-able iterator
When a sleepable program is attached to a hash map iterator, might_fault()
will report "BUG: sleeping function called from invalid context..." if
CONFIG_DEBUG_ATOMIC_SLEEP is enabled. The reason is that rcu_read_lock()
is held in bpf_hash_map_seq_next() and won't be released until all elements
are traversed or bpf_hash_map_seq_stop() is called.
Fixing it by reusing BPF_ITER_RESCHED to indicate that only non-sleepable
program is allowed for iterator without BPF_ITER_RESCHED. We can revise
bpf_iter_link_attach() later if there are other conditions which may
cause rcu_read_lock() or spin_lock() issues.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao [Wed, 10 Aug 2022 08:05:34 +0000 (16:05 +0800)]
bpf: Check the validity of max_rdwr_access for sock local storage map iterator
The value of sock local storage map is writable in map iterator, so check
max_rdwr_access instead of max_rdonly_access.
Fixes:
5ce6e77c7edf ("bpf: Implement bpf iterator for sock local storage map")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao [Wed, 10 Aug 2022 08:05:33 +0000 (16:05 +0800)]
bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator
sock_map_iter_attach_target() acquires a map uref, and the uref may be
released before or in the middle of iterating map elements. For example,
the uref could be released in sock_map_iter_detach_target() as part of
bpf_link_release(), or could be released in bpf_map_put_with_uref() as
part of bpf_map_release().
Fixing it by acquiring an extra map uref in .init_seq_private and
releasing it in .fini_seq_private.
Fixes:
0365351524d7 ("net: Allow iterating sockmap and sockhash")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao [Wed, 10 Aug 2022 08:05:32 +0000 (16:05 +0800)]
bpf: Acquire map uref in .init_seq_private for sock local storage map iterator
bpf_iter_attach_map() acquires a map uref, and the uref may be released
before or in the middle of iterating map elements. For example, the uref
could be released in bpf_iter_detach_map() as part of
bpf_link_release(), or could be released in bpf_map_put_with_uref() as
part of bpf_map_release().
So acquiring an extra map uref in bpf_iter_init_sk_storage_map() and
releasing it in bpf_iter_fini_sk_storage_map().
Fixes:
5ce6e77c7edf ("bpf: Implement bpf iterator for sock local storage map")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao [Wed, 10 Aug 2022 08:05:31 +0000 (16:05 +0800)]
bpf: Acquire map uref in .init_seq_private for hash map iterator
bpf_iter_attach_map() acquires a map uref, and the uref may be released
before or in the middle of iterating map elements. For example, the uref
could be released in bpf_iter_detach_map() as part of
bpf_link_release(), or could be released in bpf_map_put_with_uref() as
part of bpf_map_release().
So acquiring an extra map uref in bpf_iter_init_hash_map() and
releasing it in bpf_iter_fini_hash_map().
Fixes:
d6c4503cc296 ("bpf: Implement bpf iterator for hash maps")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao [Wed, 10 Aug 2022 08:05:30 +0000 (16:05 +0800)]
bpf: Acquire map uref in .init_seq_private for array map iterator
bpf_iter_attach_map() acquires a map uref, and the uref may be released
before or in the middle of iterating map elements. For example, the uref
could be released in bpf_iter_detach_map() as part of
bpf_link_release(), or could be released in bpf_map_put_with_uref() as
part of bpf_map_release().
Alternative fix is acquiring an extra bpf_link reference just like
a pinned map iterator does, but it introduces unnecessary dependency
on bpf_link instead of bpf_map.
So choose another fix: acquiring an extra map uref in .init_seq_private
for array map iterator.
Fixes:
d3cc2ab546ad ("bpf: Implement bpf iterator for array maps")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Tue, 9 Aug 2022 03:58:09 +0000 (20:58 -0700)]
bpf: Disallow bpf programs call prog_run command.
The verifier cannot perform sufficient validation of bpf_attr->test.ctx_in
pointer, therefore bpf programs should not be allowed to call BPF_PROG_RUN
command from within the program.
To fix this issue split bpf_sys_bpf() bpf helper into normal kern_sys_bpf()
kernel function that can only be used by the kernel light skeleton directly.
Reported-by: YiFei Zhu <zhuyifei@google.com>
Fixes:
b1d18a7574d0 ("bpf: Extend sys_bpf commands for bpf_syscall programs.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Xu Kuohai [Mon, 8 Aug 2022 04:07:35 +0000 (00:07 -0400)]
bpf, arm64: Fix bpf trampoline instruction endianness
The sparse tool complains as follows:
arch/arm64/net/bpf_jit_comp.c:1684:16:
warning: incorrect type in assignment (different base types)
arch/arm64/net/bpf_jit_comp.c:1684:16:
expected unsigned int [usertype] *branch
arch/arm64/net/bpf_jit_comp.c:1684:16:
got restricted __le32 [usertype] *
arch/arm64/net/bpf_jit_comp.c:1700:52:
error: subtraction of different types can't work (different base
types)
arch/arm64/net/bpf_jit_comp.c:1734:29:
warning: incorrect type in assignment (different base types)
arch/arm64/net/bpf_jit_comp.c:1734:29:
expected unsigned int [usertype] *
arch/arm64/net/bpf_jit_comp.c:1734:29:
got restricted __le32 [usertype] *
arch/arm64/net/bpf_jit_comp.c:1918:52:
error: subtraction of different types can't work (different base
types)
This is because the variable branch in function invoke_bpf_prog and the
variable branches in function prepare_trampoline are defined as type
u32 *, which conflicts with ctx->image's type __le32 *, so sparse complains
when assignment or arithmetic operation are performed on these two
variables and ctx->image.
Since arm64 instructions are always little-endian, change the type of
these two variables to __le32 * and call cpu_to_le32() to convert
instruction to little-endian before writing it to memory. This is also
in line with emit() which internally does cpu_to_le32(), too.
Fixes:
efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/bpf/20220808040735.1232002-1-xukuohai@huawei.com
Alexei Starovoitov [Wed, 10 Aug 2022 01:46:12 +0000 (18:46 -0700)]
Merge branch 'Don't reinit map value in prealloc_lru_pop'
Kumar Kartikeya Dwivedi says:
====================
Fix for a bug in prealloc_lru_pop spotted while reading the code, then a test +
example that checks whether it is fixed.
Changelog:
----------
v2 -> v3:
v2: https://lore.kernel.org/bpf/
20220809140615.21231-1-memxor@gmail.com
* Switch test to use kptr instead of kptr_ref to stabilize test runs
* Fix missing lru_bug__destroy (Yonghong)
* Collect Acks
v1 -> v2:
v1: https://lore.kernel.org/bpf/
20220806014603.1771-1-memxor@gmail.com
* Expand commit log to include summary of the discussion with Yonghong
* Make lru_bug selftest serial to not mess up refcount for map_kptr test
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Tue, 9 Aug 2022 21:30:33 +0000 (23:30 +0200)]
selftests/bpf: Add test for prealloc_lru_pop bug
Add a regression test to check against invalid check_and_init_map_value
call inside prealloc_lru_pop.
The kptr should not be reset to NULL once we set it after deleting the
map element. Hence, we trigger a program that updates the element
causing its reuse, and checks whether the unref kptr is reset or not.
If it is, prealloc_lru_pop does an incorrect check_and_init_map_value
call and the test fails.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220809213033.24147-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Tue, 9 Aug 2022 21:30:32 +0000 (23:30 +0200)]
bpf: Don't reinit map value in prealloc_lru_pop
The LRU map that is preallocated may have its elements reused while
another program holds a pointer to it from bpf_map_lookup_elem. Hence,
only check_and_free_fields is appropriate when the element is being
deleted, as it ensures proper synchronization against concurrent access
of the map value. After that, we cannot call check_and_init_map_value
again as it may rewrite bpf_spin_lock, bpf_timer, and kptr fields while
they can be concurrently accessed from a BPF program.
This is safe to do as when the map entry is deleted, concurrent access
is protected against by check_and_free_fields, i.e. an existing timer
would be freed, and any existing kptr will be released by it. The
program can create further timers and kptrs after check_and_free_fields,
but they will eventually be released once the preallocated items are
freed on map destruction, even if the item is never reused again. Hence,
the deleted item sitting in the free list can still have resources
attached to it, and they would never leak.
With spin_lock, we never touch the field at all on delete or update, as
we may end up modifying the state of the lock. Since the verifier
ensures that a bpf_spin_lock call is always paired with bpf_spin_unlock
call, the program will eventually release the lock so that on reuse the
new user of the value can take the lock.
Essentially, for the preallocated case, we must assume that the map
value may always be in use by the program, even when it is sitting in
the freelist, and handle things accordingly, i.e. use proper
synchronization inside check_and_free_fields, and never reinitialize the
special fields when it is reused on update.
Fixes:
68134668c17f ("bpf: Add map side support for bpf timers.")
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220809213033.24147-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Tue, 9 Aug 2022 21:30:31 +0000 (23:30 +0200)]
bpf: Allow calling bpf_prog_test kfuncs in tracing programs
In addition to TC hook, enable these in tracing programs so that they
can be used in selftests.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220809213033.24147-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Aijun Sun [Thu, 4 Aug 2022 02:54:42 +0000 (10:54 +0800)]
bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc
It is not necessary to allocate contiguous physical memory for BPF
program buffer using kcalloc. When the BPF program is large more than
memory page size, kcalloc allocates multiple memory pages from buddy
system. If the device can not provide sufficient memory, for example
in low-end android devices [0], memory allocation for BPF program is
likely to fail.
Test cases in lib/test_bpf.c all pass on ARM64 QEMU.
[0]
AndroidTestSuit: page allocation failure: order:4,
mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=foreground,mems_allowed=0
Call trace:
dump_stack+0xa4/0x114
warn_alloc+0xf8/0x14c
__alloc_pages_slowpath+0xac8/0xb14
__alloc_pages_nodemask+0x194/0x3d0
kmalloc_order_trace+0x44/0x1e8
__kmalloc+0x29c/0x66c
bpf_int_jit_compile+0x17c/0x568
bpf_prog_select_runtime+0x4c/0x1b0
bpf_prepare_filter+0x5fc/0x6bc
bpf_prog_create_from_user+0x118/0x1c0
seccomp_set_mode_filter+0x1c4/0x7cc
__do_sys_prctl+0x380/0x1424
__arm64_sys_prctl+0x20/0x2c
el0_svc_common+0xc8/0x22c
el0_svc_handler+0x1c/0x28
el0_svc+0x8/0x100
Signed-off-by: Aijun Sun <aijun.sun@unisoc.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220804025442.22524-1-aijun.sun@unisoc.com
Stanislav Fomichev [Thu, 4 Aug 2022 20:11:40 +0000 (13:11 -0700)]
selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf
Apparently, no existing selftest covers it. Add a new one where
we load cgroup/bind4 program and attach fentry to it. Calling
bpf_obj_get_info_by_fd on the fentry program should return non-zero
btf_id/btf_obj_id instead of crashing the kernel.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220804201140.1340684-2-sdf@google.com
Stanislav Fomichev [Thu, 4 Aug 2022 20:11:39 +0000 (13:11 -0700)]
bpf: Use proper target btf when exporting attach_btf_obj_id
When attaching to program, the program itself might not be attached
to anything (and, hence, might not have attach_btf), so we can't
unconditionally use 'prog->aux->dst_prog->aux->attach_btf'.
Instead, use bpf_prog_get_target_btf to pick proper target BTF:
* when attached to dst_prog, use dst_prog->aux->btf
* when attached to kernel btf, use prog->aux->attach_btf
Fixes:
b79c9fc9551b ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220804201140.1340684-1-sdf@google.com
Jiri Olsa [Tue, 2 Aug 2022 16:33:24 +0000 (18:33 +0200)]
mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled
The btf_sock_ids array needs struct mptcp_sock BTF ID for the
bpf_skc_to_mptcp_sock helper.
When CONFIG_MPTCP is disabled, the 'struct mptcp_sock' is not
defined and resolve_btfids will complain with:
[...]
BTFIDS vmlinux
WARN: resolve_btfids: unresolved symbol mptcp_sock
[...]
Add an empty definition for struct mptcp_sock when CONFIG_MPTCP
is disabled.
Fixes:
3bc253c2e652 ("bpf: Add bpf_skc_to_mptcp_sock_proto")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220802163324.1873044-1-jolsa@kernel.org
Jiri Olsa [Tue, 2 Aug 2022 13:56:51 +0000 (15:56 +0200)]
bpf: Cleanup ftrace hash in bpf_trampoline_put
We need to release possible hash from trampoline fops object
before removing it, otherwise we leak it.
Fixes:
00963a2e75a8 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20220802135651.1794015-1-jolsa@kernel.org
Jinghao Jia [Fri, 29 Jul 2022 20:17:13 +0000 (20:17 +0000)]
BPF: Fix potential bad pointer dereference in bpf_sys_bpf()
The bpf_sys_bpf() helper function allows an eBPF program to load another
eBPF program from within the kernel. In this case the argument union
bpf_attr pointer (as well as the insns and license pointers inside) is a
kernel address instead of a userspace address (which is the case of a
usual bpf() syscall). To make the memory copying process in the syscall
work in both cases, bpfptr_t was introduced to wrap around the pointer
and distinguish its origin. Specifically, when copying memory contents
from a bpfptr_t, a copy_from_user() is performed in case of a userspace
address and a memcpy() is performed for a kernel address.
This can lead to problems because the in-kernel pointer is never checked
for validity. The problem happens when an eBPF syscall program tries to
call bpf_sys_bpf() to load a program but provides a bad insns pointer --
say 0xdeadbeef -- in the bpf_attr union. The helper calls __sys_bpf()
which would then call bpf_prog_load() to load the program.
bpf_prog_load() is responsible for copying the eBPF instructions to the
newly allocated memory for the program; it creates a kernel bpfptr_t for
insns and invokes copy_from_bpfptr(). Internally, all bpfptr_t
operations are backed by the corresponding sockptr_t operations, which
performs direct memcpy() on kernel pointers for copy_from/strncpy_from
operations. Therefore, the code is always happy to dereference the bad
pointer to trigger a un-handle-able page fault and in turn an oops.
However, this is not supposed to happen because at that point the eBPF
program is already verified and should not cause a memory error.
Sample KASAN trace:
[ 25.685056][ T228] ==================================================================
[ 25.685680][ T228] BUG: KASAN: user-memory-access in copy_from_bpfptr+0x21/0x30
[ 25.686210][ T228] Read of size 80 at addr
00000000deadbeef by task poc/228
[ 25.686732][ T228]
[ 25.686893][ T228] CPU: 3 PID: 228 Comm: poc Not tainted 5.19.0-rc7 #7
[ 25.687375][ T228] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
[ 25.687991][ T228] Call Trace:
[ 25.688223][ T228] <TASK>
[ 25.688429][ T228] dump_stack_lvl+0x73/0x9e
[ 25.688747][ T228] print_report+0xea/0x200
[ 25.689061][ T228] ? copy_from_bpfptr+0x21/0x30
[ 25.689401][ T228] ? _printk+0x54/0x6e
[ 25.689693][ T228] ? _raw_spin_lock_irqsave+0x70/0xd0
[ 25.690071][ T228] ? copy_from_bpfptr+0x21/0x30
[ 25.690412][ T228] kasan_report+0xb5/0xe0
[ 25.690716][ T228] ? copy_from_bpfptr+0x21/0x30
[ 25.691059][ T228] kasan_check_range+0x2bd/0x2e0
[ 25.691405][ T228] ? copy_from_bpfptr+0x21/0x30
[ 25.691734][ T228] memcpy+0x25/0x60
[ 25.692000][ T228] copy_from_bpfptr+0x21/0x30
[ 25.692328][ T228] bpf_prog_load+0x604/0x9e0
[ 25.692653][ T228] ? cap_capable+0xb4/0xe0
[ 25.692956][ T228] ? security_capable+0x4f/0x70
[ 25.693324][ T228] __sys_bpf+0x3af/0x580
[ 25.693635][ T228] bpf_sys_bpf+0x45/0x240
[ 25.693937][ T228] bpf_prog_f0ec79a5a3caca46_bpf_func1+0xa2/0xbd
[ 25.694394][ T228] bpf_prog_run_pin_on_cpu+0x2f/0xb0
[ 25.694756][ T228] bpf_prog_test_run_syscall+0x146/0x1c0
[ 25.695144][ T228] bpf_prog_test_run+0x172/0x190
[ 25.695487][ T228] __sys_bpf+0x2c5/0x580
[ 25.695776][ T228] __x64_sys_bpf+0x3a/0x50
[ 25.696084][ T228] do_syscall_64+0x60/0x90
[ 25.696393][ T228] ? fpregs_assert_state_consistent+0x50/0x60
[ 25.696815][ T228] ? exit_to_user_mode_prepare+0x36/0xa0
[ 25.697202][ T228] ? syscall_exit_to_user_mode+0x20/0x40
[ 25.697586][ T228] ? do_syscall_64+0x6e/0x90
[ 25.697899][ T228] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 25.698312][ T228] RIP: 0033:0x7f6d543fb759
[ 25.698624][ T228] Code: 08 5b 89 e8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 a6 0e 00 f7 d8 64 89 01 48
[ 25.699946][ T228] RSP: 002b:
00007ffc3df78468 EFLAGS:
00000287 ORIG_RAX:
0000000000000141
[ 25.700526][ T228] RAX:
ffffffffffffffda RBX:
00007ffc3df78628 RCX:
00007f6d543fb759
[ 25.701071][ T228] RDX:
0000000000000090 RSI:
00007ffc3df78478 RDI:
000000000000000a
[ 25.701636][ T228] RBP:
00007ffc3df78510 R08:
0000000000000000 R09:
0000000000300000
[ 25.702191][ T228] R10:
0000000000000005 R11:
0000000000000287 R12:
0000000000000000
[ 25.702736][ T228] R13:
00007ffc3df78638 R14:
000055a1584aca68 R15:
00007f6d5456a000
[ 25.703282][ T228] </TASK>
[ 25.703490][ T228] ==================================================================
[ 25.704050][ T228] Disabling lock debugging due to kernel taint
Update copy_from_bpfptr() and strncpy_from_bpfptr() so that:
- for a kernel pointer, it uses the safe copy_from_kernel_nofault() and
strncpy_from_kernel_nofault() functions.
- for a userspace pointer, it performs copy_from_user() and
strncpy_from_user().
Fixes:
af2ac3e13e45 ("bpf: Prepare bpf syscall to be used from kernel and user space.")
Link: https://lore.kernel.org/bpf/20220727132905.45166-1-jinghao@linux.ibm.com/
Signed-off-by: Jinghao Jia <jinghao@linux.ibm.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220729201713.88688-1-jinghao@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Paul E. McKenney [Tue, 2 Aug 2022 17:39:13 +0000 (10:39 -0700)]
bpf: Update bpf_design_QA.rst to clarify that BTF_ID does not ABIify a function
This patch updates bpf_design_QA.rst to clarify that mentioning a function
to the BTF_ID macro does not make that function become part of the Linux
kernel's ABI.
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20220802173913.4170192-3-paulmck@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Paul E. McKenney [Tue, 2 Aug 2022 17:39:12 +0000 (10:39 -0700)]
bpf: Update bpf_design_QA.rst to clarify that attaching to functions is not ABI
This patch updates bpf_design_QA.rst to clarify that the ability to
attach a BPF program to an arbitrary function in the kernel does not
make that function become part of the Linux kernel's ABI.
[ paulmck: Apply Daniel Borkmann feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20220802173913.4170192-2-paulmck@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Paul E. McKenney [Tue, 2 Aug 2022 17:39:11 +0000 (10:39 -0700)]
bpf: Update bpf_design_QA.rst to clarify that kprobes is not ABI
This patch updates bpf_design_QA.rst to clarify that the ability to
attach a BPF program to a given point in the kernel code via kprobes
does not make that attachment point be part of the Linux kernel's ABI.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20220802173913.4170192-1-paulmck@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yu Xiao [Tue, 2 Aug 2022 09:33:55 +0000 (10:33 +0100)]
nfp: ethtool: fix the display error of `ethtool -m DEVNAME`
The port flag isn't set to `NFP_PORT_CHANGED` when using
`ethtool -m DEVNAME` before, so the port state (e.g. interface)
cannot be updated. Therefore, it caused that `ethtool -m DEVNAME`
sometimes cannot read the correct information.
E.g. `ethtool -m DEVNAME` cannot work when load driver before plug
in optical module, as the port interface is still NONE without port
update.
Now update the port state before sending info to NIC to ensure that
port interface is correct (latest state).
Fixes:
61f7c6f44870 ("nfp: implement ethtool get module EEPROM")
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220802093355.69065-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Mon, 1 Aug 2022 23:34:03 +0000 (16:34 -0700)]
net: phy: Warn about incorrect mdio_bus_phy_resume() state
Calling mdio_bus_phy_resume() with neither the PHY state machine set to
PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication
that we can produce a race condition looking like this:
CPU0 CPU1
bcmgenet_resume
-> phy_resume
-> phy_init_hw
-> phy_start
-> phy_resume
phy_start_aneg()
mdio_bus_phy_resume
-> phy_resume
-> phy_write(..., BMCR_RESET)
-> usleep() -> phy_read()
with the phy_resume() function triggering a PHY behavior that might have
to be worked around with (see
bf8bfc4336f7 ("net: phy: broadcom: Fix
brcm_fet_config_init()") for instance) that ultimately leads to an error
reading from the PHY.
Fixes:
fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220801233403.258871-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 4 Aug 2022 02:20:17 +0000 (19:20 -0700)]
Merge branch 'make-dsa-work-with-bonding-s-arp-monitor'
Vladimir Oltean says:
====================
Make DSA work with bonding's ARP monitor
Since commit
2b86cb829976 ("net: dsa: declare lockless TX feature for
slave ports") in v5.7, DSA breaks the ARP monitoring logic from the
bonding driver, fact which was pointed out by Brian Hutchinson who uses
a linux-5.10.y stable kernel.
Initially I got lured by other similar hacks introduced for other
NETIF_F_LLTX drivers, which, inspired by the bonding documentation,
update the trans_start of their TX queues by hand.
However Jakub pointed out that this simply isn't a proper solution, and
after coming to think more about it, I agree, and it doesn't work
properly with DSA nor is it maintainable for the future changes I plan
for it (multiple DSA masters in a LAG).
I've tested these changes using a DSA-based setup and a veth-based
setup, using the active-backup mode and ARP monitoring, with and without
arp_validate.
Link to v1:
https://patchwork.kernel.org/project/netdevbpf/patch/
20220715232641.952532-1-vladimir.oltean@nxp.com/
Link to v2:
https://patchwork.kernel.org/project/netdevbpf/patch/
20220727152000.3616086-1-vladimir.oltean@nxp.com/
====================
Link: https://lore.kernel.org/r/20220731124108.2810233-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sun, 31 Jul 2022 12:41:08 +0000 (15:41 +0300)]
docs: net: bonding: remove mentions of trans_start
ARP monitoring no longer depends on dev->last_rx or dev_trans_start(),
so delete this information.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sun, 31 Jul 2022 12:41:07 +0000 (15:41 +0300)]
Revert "veth: Add updating of trans_start"
This reverts commit
e66e257a5d8368d9c0ba13d4630f474436533e8b. The veth
driver no longer needs these hacks which are slightly detrimential to
the fast path performance, because the bonding driver is keeping track
of TX times of ARP and NS probes by itself, which it should.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sun, 31 Jul 2022 12:41:06 +0000 (15:41 +0300)]
net/sched: remove hacks added to dev_trans_start() for bonding to work
Now that the bonding driver keeps track of the last TX time of ARP and
NS probes, we effectively revert the following commits:
32d3e51a82d4 ("net_sched: use macvlan real dev trans_start in dev_trans_start()")
07ce76aa9bcf ("net_sched: make dev_trans_start return vlan's real dev trans_start")
Note that the approach of continuing to hack at this function would not
get us very far, hence the desire to take a different approach. DSA is
also a virtual device that uses NETIF_F_LLTX, but there, many uppers
share the same lower (DSA master, i.e. the physical host port of a
switch). By making dev_trans_start() on a DSA interface return the
dev_trans_start() of the master, we effectively assume that all other
DSA interfaces are silent, otherwise this corrupts the validity of the
probe timestamp data from the bonding driver's perspective.
Furthermore, the hacks didn't take into consideration the fact that the
lower interface of @dev may not have been physical either. For example,
VLAN over VLAN, or DSA with 2 masters in a LAG.
And even furthermore, there are NETIF_F_LLTX devices which are not
stacked, like veth. The hack here would not work with those, because it
would not have to provide the bonding driver something to chew at all.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sun, 31 Jul 2022 12:41:05 +0000 (15:41 +0300)]
net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
The bonding driver piggybacks on time stamps kept by the network stack
for the purpose of the netdev TX watchdog, and this is problematic
because it does not work with NETIF_F_LLTX devices.
It is hard to say why the driver looks at dev_trans_start() of the
slave->dev, considering that this is updated even by non-ARP/NS probes
sent by us, and even by traffic not sent by us at all (for example PTP
on physical slave devices). ARP monitoring in active-backup mode appears
to still work even if we track only the last TX time of actual ARP
probes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 3 Aug 2022 23:29:08 +0000 (16:29 -0700)]
Merge tag 'net-next-6.0' of git://git./linux/kernel/git/netdev/net-next
Pull networking changes from Paolo Abeni:
"Core:
- Refactor the forward memory allocation to better cope with memory
pressure with many open sockets, moving from a per socket cache to
a per-CPU one
- Replace rwlocks with RCU for better fairness in ping, raw sockets
and IP multicast router.
- Network-side support for IO uring zero-copy send.
- A few skb drop reason improvements, including codegen the source
file with string mapping instead of using macro magic.
- Rename reference tracking helpers to a more consistent netdev_*
schema.
- Adapt u64_stats_t type to address load/store tearing issues.
- Refine debug helper usage to reduce the log noise caused by bots.
BPF:
- Improve socket map performance, avoiding skb cloning on read
operation.
- Add support for 64 bits enum, to match types exposed by kernel.
- Introduce support for sleepable uprobes program.
- Introduce support for enum textual representation in libbpf.
- New helpers to implement synproxy with eBPF/XDP.
- Improve loop performances, inlining indirect calls when possible.
- Removed all the deprecated libbpf APIs.
- Implement new eBPF-based LSM flavor.
- Add type match support, which allow accurate queries to the eBPF
used types.
- A few TCP congetsion control framework usability improvements.
- Add new infrastructure to manipulate CT entries via eBPF programs.
- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.
Protocols:
- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.
- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.
- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.
- Support for changing the initial MTPCP subflow priority/backup
status
- Introduce virtually contingus buffers for sockets over RDMA, to
cope better with memory pressure.
- Extend CAN ethtool support with timestamping capabilities
- Refactor CAN build infrastructure to allow building only the needed
features.
Driver API:
- Remove devlink mutex to allow parallel commands on multiple links.
- Add support for pause stats in distributed switch.
- Implement devlink helpers to query and flash line cards.
- New helper for phy mode to register conversion.
New hardware / drivers:
- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
- Ethernet DSA driver for the Microchip LAN937x switch.
- Ethernet PHY driver for the Aquantia AQR113C EPHY.
- CAN driver for the OBD-II ELM327 interface.
- CAN driver for RZ/N1 SJA1000 CAN controller.
- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
Drivers:
- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload
- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate
- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default
- Microsoft vNIC driver (mana)
- add support for XDP redirect
- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support
- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability (parts 1-6)
- support for PTP in Spectrum-2 asics
- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY
- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload
- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share the
probe code (parts 1, 2), add support for phylink mac
configuration
- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support
Old code removal:
- Neterion vxge ethernet driver: this is untouched since more than 10 years"
* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
doc: sfp-phylink: Fix a broken reference
wireguard: selftests: support UML
wireguard: allowedips: don't corrupt stack when detecting overflow
wireguard: selftests: update config fragments
wireguard: ratelimiter: use hrtimer in selftest
net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
net: usb: ax88179_178a: Bind only to vendor-specific interface
selftests: net: fix IOAM test skip return code
net: usb: make USB_RTL8153_ECM non user configurable
net: marvell: prestera: remove reduntant code
octeontx2-pf: Reduce minimum mtu size to 60
net: devlink: Fix missing mutex_unlock() call
net/tls: Remove redundant workqueue flush before destroy
net: txgbe: Fix an error handling path in txgbe_probe()
net: dsa: Fix spelling mistakes and cleanup code
Documentation: devlink: add add devlink-selftests to the table of contents
dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
net: ionic: fix error check for vlan flags in ionic_set_nic_features()
net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
nfp: flower: add support for tunnel offload without key ID
...
Linus Torvalds [Wed, 3 Aug 2022 22:26:04 +0000 (15:26 -0700)]
Merge tag 'ata-5.20-rc1' of git://git./linux/kernel/git/dlemoal/libata
Pull ATA updates from Damien Le Moal:
- Some code refactoring for the pata_hpt37x and pata_hpt3x2n drivers,
from Sergei.
- Several patches to cleanup in libata-core, libata-scsi and libata-eh
code: fixes arguments and variables types, change some functions
declaration to static and fix for a typo in a comment. From Sergey
and Xiang.
- Fix a compilation warning in the pata_macio driver, from me.
- A fix for the expected number of resources in the sata_mv driver fix,
from Andrew.
* tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: sata_mv: Fixes expected number of resources now IRQs are gone
ata: libata-scsi: fix result type of ata_ioc32()
ata: pata_macio: Fix compilation warning
ata: libata-eh: fix sloppy result type of ata_internal_cmd_timeout()
ata: libata-core: fix sloppy parameter type in ata_exec_internal[_sg]()
ata: make ata_port::fastdrain_cnt *unsigned int*
ata: libata-eh: fix sloppy result type of ata_eh_nr_in_flight()
ata: libata-core: make ata_exec_internal_sg() *static*
ata: make transfer mode masks *unsigned int*
ata: libata-core: get rid of *else* branches in ata_id_n_sectors()
ata: libata-core: fix sloppy typing in ata_id_n_sectors()
ata: pata_hpt3x2n: pass base DPLL frequency to hpt3x2n_pci_clock()
ata: pata_hpt37x: merge hpt374_read_freq() to hpt37x_pci_clock()
ata: pata_hpt37x: factor out hpt37x_pci_clock()
ata: pata_hpt37x: move claculating PCI clock from hpt37x_clock_slot()
ata: libata: Fix syntax errors in comments
Linus Torvalds [Wed, 3 Aug 2022 22:21:53 +0000 (15:21 -0700)]
Merge tag 'zonefs-5.20-rc1' of git://git./linux/kernel/git/dlemoal/zonefs
Pull zonefs update from Damien Le Moal:
"A single change for this cycle to simplify handling of the memory page
used as super block buffer during mount (from Fabio)"
* tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Call page_address() on page acquired with GFP_KERNEL flag
Linus Torvalds [Wed, 3 Aug 2022 22:16:49 +0000 (15:16 -0700)]
Merge tag 'iomap-5.20-merge-1' of git://git./fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
"The most notable change in this first batch is that we no longer
schedule pages beyond i_size for writeback, preferring instead to let
truncate deal with those pages.
Next week, there may be a second pull request to remove
iomap_writepage from the other two filesystems (gfs2/zonefs) that use
iomap for buffered IO. This follows in the same vein as the recent
removal of writepage from XFS, since it hasn't been triggered in a few
years; it does nothing during direct reclaim; and as far as the people
who examined the patchset can tell, it's moving the codebase in the
right direction.
However, as it was a late addition to for-next, I'm holding off on
that section for another week of testing to see if anyone can come up
with a solid reason for holding off in the meantime.
Summary:
- Skip writeback for pages that are completely beyond EOF
- Minor code cleanups"
* tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
dax: set did_zero to true when zeroing successfully
iomap: set did_zero to true when zeroing successfully
iomap: skip pages past eof in iomap_do_writepage()
Linus Torvalds [Wed, 3 Aug 2022 22:12:40 +0000 (15:12 -0700)]
Merge tag 'affs-5.20-tag' of git://git./linux/kernel/git/kdave/linux
Pull affs fix from David Sterba:
"One update to AFFS, switching away from the kmap/kmap_atomic API"
* tag 'affs-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: use memcpy_to_page and remove replace kmap_atomic()
Linus Torvalds [Wed, 3 Aug 2022 21:54:52 +0000 (14:54 -0700)]
Merge tag 'for-5.20-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"This brings some long awaited changes, the send protocol bump,
otherwise lots of small improvements and fixes. The main core part is
reworking bio handling, cleaning up the submission and endio and
improving error handling.
There are some changes outside of btrfs adding helpers or updating
API, listed at the end of the changelog.
Features:
- sysfs:
- export chunk size, in debug mode add tunable for setting its size
- show zoned among features (was only in debug mode)
- show commit stats (number, last/max/total duration)
- send protocol updated to 2
- new commands:
- ability write larger data chunks than 64K
- send raw compressed extents (uses the encoded data ioctls),
ie. no decompression on send side, no compression needed on
receive side if supported
- send 'otime' (inode creation time) among other timestamps
- send file attributes (a.k.a file flags and xflags)
- this is first version bump, backward compatibility on send and
receive side is provided
- there are still some known and wanted commands that will be
implemented in the near future, another version bump will be
needed, however we want to minimize that to avoid causing
usability issues
- print checksum type and implementation at mount time
- don't print some messages at mount (mentioned as people asked about
it), we want to print messages namely for new features so let's
make some space for that
- big metadata - this has been supported for a long time and is
not a feature that's worth mentioning
- skinny metadata - same reason, set by default by mkfs
Performance improvements:
- reduced amount of reserved metadata for delayed items
- when inserted items can be batched into one leaf
- when deleting batched directory index items
- when deleting delayed items used for deletion
- overall improved count of files/sec, decreased subvolume lock
contention
- metadata item access bounds checker micro-optimized, with a few
percent of improved runtime for metadata-heavy operations
- increase direct io limit for read to 256 sectors, improved
throughput by 3x on sample workload
Notable fixes:
- raid56
- reduce parity writes, skip sectors of stripe when there are no
data updates
- restore reading from on-disk data instead of using stripe cache,
this reduces chances to damage correct data due to RMW cycle
- refuse to replay log with unknown incompat read-only feature bit
set
- zoned
- fix page locking when COW fails in the middle of allocation
- improved tracking of active zones, ZNS drives may limit the
number and there are ENOSPC errors due to that limit and not
actual lack of space
- adjust maximum extent size for zone append so it does not cause
late ENOSPC due to underreservation
- mirror reading error messages show the mirror number
- don't fallback to buffered IO for NOWAIT direct IO writes, we don't
have the NOWAIT semantics for buffered io yet
- send, fix sending link commands for existing file paths when there
are deleted and created hardlinks for same files
- repair all mirrors for profiles with more than 1 copy (raid1c34)
- fix repair of compressed extents, unify where error detection and
repair happen
Core changes:
- bio completion cleanups
- don't double defer compression bios
- simplify endio workqueues
- add more data to btrfs_bio to avoid allocation for read requests
- rework bio error handling so it's same what block layer does,
the submission works and errors are consumed in endio
- when asynchronous bio offload fails fall back to synchronous
checksum calculation to avoid errors under writeback or memory
pressure
- new trace points
- raid56 events
- ordered extent operations
- super block log_root_transid deprecated (never used)
- mixed_backref and big_metadata sysfs feature files removed, they've
been default for sufficiently long time, there are no known users
and mixed_backref could be confused with mixed_groups
Non-btrfs changes, API updates:
- minor highmem API update to cover const arguments
- switch all kmap/kmap_atomic to kmap_local
- remove redundant flush_dcache_page()
- address_space_operations::writepage callback removed
- add bdev_max_segments() helper"
* tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (163 commits)
btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read
btrfs: fix repair of compressed extents
btrfs: remove the start argument to check_data_csum and export
btrfs: pass a btrfs_bio to btrfs_repair_one_sector
btrfs: simplify the pending I/O counting in struct compressed_bio
btrfs: repair all known bad mirrors
btrfs: merge btrfs_dev_stat_print_on_error with its only caller
btrfs: join running log transaction when logging new name
btrfs: simplify error handling in btrfs_lookup_dentry
btrfs: send: always use the rbtree based inode ref management infrastructure
btrfs: send: fix sending link commands for existing file paths
btrfs: send: introduce recorded_ref_alloc and recorded_ref_free
btrfs: zoned: wait until zone is finished when allocation didn't progress
btrfs: zoned: write out partially allocated region
btrfs: zoned: activate necessary block group
btrfs: zoned: activate metadata block group on flush_space
btrfs: zoned: disable metadata overcommit for zoned
btrfs: zoned: introduce space_info->active_total_bytes
btrfs: zoned: finish least available block group on data bg allocation
btrfs: let can_allocate_chunk return error
...
Linus Torvalds [Wed, 3 Aug 2022 21:41:36 +0000 (14:41 -0700)]
Merge tag 'efi-efivars-removal-for-v5.20' of git://git./linux/kernel/git/efi/efi
Pull efivars sysfs interface removal from Ard Biesheuvel:
"Remove the obsolete 'efivars' sysfs based interface to the EFI
variable store, now that all users have moved to the efivarfs pseudo
file system, which was created ~10 years ago to address some
fundamental shortcomings in the sysfs based driver.
Move the 'business logic' related to which EFI variables are important
and may affect the boot flow from the efivars support layer into the
efivarfs pseudo file system, so it is no longer exposed to other parts
of the kernel"
* tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: vars: Move efivar caching layer into efivarfs
efi: vars: Switch to new wrapper layer
efi: vars: Remove deprecated 'efivars' sysfs interface
Linus Torvalds [Wed, 3 Aug 2022 21:38:02 +0000 (14:38 -0700)]
Merge tag 'efi-next-for-v5.20' of git://git./linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
- Enable mirrored memory for arm64
- Fix up several abuses of the efivar API
- Refactor the efivar API in preparation for moving the 'business
logic' part of it into efivarfs
- Enable ACPI PRM on arm64
* tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
ACPI: Move PRM config option under the main ACPI config
ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64
ACPI: PRM: Change handler_addr type to void pointer
efi: Simplify arch_efi_call_virt() macro
drivers: fix typo in firmware/efi/memmap.c
efi: vars: Drop __efivar_entry_iter() helper which is no longer used
efi: vars: Use locking version to iterate over efivars linked lists
efi: pstore: Omit efivars caching EFI varstore access layer
efi: vars: Add thin wrapper around EFI get/set variable interface
efi: vars: Don't drop lock in the middle of efivar_init()
pstore: Add priv field to pstore_record for backend specific use
Input: applespi - avoid efivars API and invoke EFI services directly
selftests/kexec: remove broken EFI_VARS secure boot fallback check
brcmfmac: Switch to appropriate helper to load EFI variable contents
iwlwifi: Switch to proper EFI variable store interface
media: atomisp_gmin_platform: stop abusing efivar API
efi: efibc: avoid efivar API for setting variables
efi: avoid efivars layer when loading SSDTs from variables
efi: Correct comment on efi_memmap_alloc
memblock: Disable mirror feature if kernelcore is not specified
...
Linus Torvalds [Wed, 3 Aug 2022 21:03:51 +0000 (14:03 -0700)]
Merge tag 'pull-work.9p' of git://git./linux/kernel/git/viro/vfs
Pull 9p iov_iter fix from Al Viro:
"net/9p abuses iov_iter primitives - it attempts to copy _from_ a
destination-only iov_iter when it handles Rerror arriving in reply to
zero-copy request. Not hard to fix, fortunately.
This is a prereq for the iov_iter_get_pages() work in the second part
of iov_iter series, ended up in a separate branch"
* tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: handling Rerror without copy_from_iter_full()
Linus Torvalds [Wed, 3 Aug 2022 20:59:15 +0000 (13:59 -0700)]
Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs
Pull copy_to_iter_mc fix from Al Viro:
"Backportable fix for copy_to_iter_mc() - the second part of iov_iter
work will pretty much overwrite this, but would be much harder to
backport"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix short copy handling in copy_mc_pipe_to_iter()
Linus Torvalds [Wed, 3 Aug 2022 20:50:22 +0000 (13:50 -0700)]
Merge tag 'pull-work.iov_iter-base' of git://git./linux/kernel/git/viro/vfs
Pull vfs iov_iter updates from Al Viro:
"Part 1 - isolated cleanups and optimizations.
One of the goals is to reduce the overhead of using ->read_iter() and
->write_iter() instead of ->read()/->write().
new_sync_{read,write}() has a surprising amount of overhead, in
particular inside iocb_flags(). That's the explanation for the
beginning of the series is in this pile; it's not directly
iov_iter-related, but it's a part of the same work..."
* tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
first_iovec_segment(): just return address
iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
iov_iter: first_{iovec,bvec}_segment() - simplify a bit
iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()
iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT
iov_iter_bvec_advance(): don't bother with bvec_iter
copy_page_{to,from}_iter(): switch iovec variants to generic
keep iocb_flags() result cached in struct file
iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
struct file: use anonymous union member for rcuhead and llist
btrfs: use IOMAP_DIO_NOSYNC
teach iomap_dio_rw() to suppress dsync
No need of likely/unlikely on calls of check_copy_size()
Linus Torvalds [Wed, 3 Aug 2022 18:43:12 +0000 (11:43 -0700)]
Merge tag 'pull-work.dcache' of git://git./linux/kernel/git/viro/vfs
Pull vfs dcache updates from Al Viro:
"The main part here is making parallel lookups safe for RT - making
sure preemption is disabled in start_dir_add()/ end_dir_add() sections
(on non-RT it's automatic, on RT it needs to to be done explicitly)
and moving wakeups from __d_lookup_done() inside of such to the end of
those sections.
Wakeups can be safely delayed for as long as ->d_lock on in-lookup
dentry is held; proving that has caught a bug in d_add_ci() that
allows memory corruption when sufficiently bogus ntfs (or
case-insensitive xfs) image is mounted. Easily fixed, fortunately"
* tag 'pull-work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/dcache: Move wakeup out of i_seq_dir write held region.
fs/dcache: Move the wakeup from __d_lookup_done() to the caller.
fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT
d_add_ci(): make sure we don't miss d_lookup_done()
Linus Torvalds [Wed, 3 Aug 2022 18:35:20 +0000 (11:35 -0700)]
Merge tag 'pull-work.lseek' of git://git./linux/kernel/git/viro/vfs
Pull vfs lseek updates from Al Viro:
"Jason's lseek series.
Saner handling of 'lseek should fail with ESPIPE' - this gets rid of
the magical no_llseek thing and makes checks consistent.
In particular, the ad-hoc "can we do splice via internal pipe" checks
got saner (and somewhat more permissive, which is what Jason had been
after, AFAICT)"
* tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove no_llseek
fs: check FMODE_LSEEK to control internal pipe splicing
vfio: do not set FMODE_LSEEK flag
dma-buf: remove useless FMODE_LSEEK flag
fs: do not compare against ->llseek
fs: clear or set FMODE_LSEEK based on llseek function
Linus Torvalds [Wed, 3 Aug 2022 18:31:42 +0000 (11:31 -0700)]
Merge tag 'pull-work.namei' of git://git./linux/kernel/git/viro/vfs
Pull vfs namei updates from Al Viro:
"RCU pathwalk cleanups.
Storing sampled ->d_seq of the next dentry in nameidata simplifies
life considerably, especially if we delay fetching ->d_inode until
step_into()"
* tag 'pull-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
step_into(): move fetching ->d_inode past handle_mounts()
lookup_fast(): don't bother with inode
follow_dotdot{,_rcu}(): don't bother with inode
step_into(): lose inode argument
namei: stash the sampled ->d_seq into nameidata
namei: move clearing LOOKUP_RCU towards rcu_read_unlock()
switch try_to_unlazy_next() to __legitimize_mnt()
follow_dotdot{,_rcu}(): change calling conventions
namei: get rid of pointless unlikely(read_seqcount_retry(...))
__follow_mount_rcu(): verify that mount_lock remains unchanged
Linus Torvalds [Wed, 3 Aug 2022 17:35:43 +0000 (10:35 -0700)]
Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache
Pull folio updates from Matthew Wilcox:
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
when running xfstests
- Convert more of mpage to use folios
- Remove add_to_page_cache() and add_to_page_cache_locked()
- Convert find_get_pages_range() to filemap_get_folios()
- Improvements to the read_cache_page() family of functions
- Remove a few unnecessary checks of PageError
- Some straightforward filesystem conversions to use folios
- Split PageMovable users out from address_space_operations into
their own movable_operations
- Convert aops->migratepage to aops->migrate_folio
- Remove nobh support (Christoph Hellwig)
* tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
fs: remove the NULL get_block case in mpage_writepages
fs: don't call ->writepage from __mpage_writepage
fs: remove the nobh helpers
jfs: stop using the nobh helper
ext2: remove nobh support
ntfs3: refactor ntfs_writepages
mm/folio-compat: Remove migration compatibility functions
fs: Remove aops->migratepage()
secretmem: Convert to migrate_folio
hugetlb: Convert to migrate_folio
aio: Convert to migrate_folio
f2fs: Convert to filemap_migrate_folio()
ubifs: Convert to filemap_migrate_folio()
btrfs: Convert btrfs_migratepage to migrate_folio
mm/migrate: Add filemap_migrate_folio()
mm/migrate: Convert migrate_page() to migrate_folio()
nfs: Convert to migrate_folio
btrfs: Convert btree_migratepage to migrate_folio
mm/migrate: Convert expected_page_refs() to folio_expected_refs()
mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
...
Linus Torvalds [Wed, 3 Aug 2022 17:02:28 +0000 (10:02 -0700)]
Merge tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray
Pull XArray/IDR updates from Matthew Wilcox:
- Add appropriate might_alloc() annotations to the XArray APIs
- Document that the IDR is deprecated
* tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray:
IDR: Note that the IDR API is deprecated
XArray: Add calls to might_alloc()
Linus Torvalds [Wed, 3 Aug 2022 16:45:08 +0000 (09:45 -0700)]
Merge tag 'cgroup-for-5.20' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Several core optimizations:
- threadgroup_rwsem write locking is skipped when configuring
controllers in empty subtrees.
Combined with CLONE_INTO_CGROUP, this allows the common static
usage pattern to not grab threadgroup_rwsem at all (glibc still
doesn't seem ready for CLONE_INTO_CGROUP unfortunately).
- threadgroup_rwsem used to be put into non-percpu mode by default
due to latency concerns in specific use cases. There's no reason
for everyone else to pay for it. Make the behavior optional.
- psi no longer allocates memory when disabled.
... along with some code cleanups"
* tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Skip subtree root in cgroup_update_dfl_csses()
cgroup: remove "no" prefixed mount options
cgroup: Make !percpu threadgroup_rwsem operations optional
cgroup: Add "no" prefixed mount options
cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
cgroup.c: remove redundant check for mixable cgroup in cgroup_migrate_vet_dst
cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes
psi: dont alloc memory for psi by default
Paolo Abeni [Wed, 3 Aug 2022 06:50:42 +0000 (08:50 +0200)]
Merge git://git./linux/kernel/git/netdev/net
Conflicts:
net/ax25/af_ax25.c
d7c4c9e075f8c ("ax25: fix incorrect dev_tracker usage")
d62607c3fe459 ("net: rename reference+tracking helpers")
drivers/net/netdevsim/fib.c
180a6a3ee60a ("netdevsim: fib: Fix reference count leak on route deletion failure")
012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Christophe JAILLET [Sun, 31 Jul 2022 05:59:00 +0000 (07:59 +0200)]
doc: sfp-phylink: Fix a broken reference
The commit in Fixes: has changed a .txt file into a .yaml file. Update the
documentation accordingly.
While at it add some `` around some file names to improve the output.
Fixes:
70991f1e6858 ("dt-bindings: net: convert sff,sfp to dtschema")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/be3c7e87ca7f027703247eccfe000b8e34805094.1659247114.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 3 Aug 2022 02:50:47 +0000 (19:50 -0700)]
Merge tag 'flexible-array-transformations-UAPI-6.0-rc1' of git://git./linux/kernel/git/gustavoars/linux
Pull uapi flexible array update from Gustavo Silva:
"A treewide patch that replaces zero-length arrays with flexible-array
members in UAPI. This has been baking in linux-next for 5 weeks now.
'-fstrict-flex-arrays=3' is coming and we need to land these changes
to prevent issues like these in the short future:
fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0, but the source string has length 2 (including NUL byte) [-Wfortify-source]
strcpy(de3->name, ".");
^
Since these are all [0] to [] changes, the risk to UAPI is nearly
zero. If this breaks anything, we can use a union with a new member
name"
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
* tag 'flexible-array-transformations-UAPI-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
treewide: uapi: Replace zero-length arrays with flexible-array members
Linus Torvalds [Wed, 3 Aug 2022 02:44:56 +0000 (19:44 -0700)]
Merge tag 'linux-kselftest-next-5.20-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:
- timers test build fixes and cleanups for new tool chains
- removing khdr from kselftest framework and main Makefile
- changes to test output messages to improve reports
* tag 'linux-kselftest-next-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (24 commits)
Makefile: replace headers_install with headers for kselftest
selftests/landlock: drop deprecated headers dependency
selftests: timers: clocksource-switch: adapt to kselftest framework
selftests: timers: clocksource-switch: add 'runtime' command line parameter
selftests: timers: clocksource-switch: add command line switch to skip sanity check
selftests: timers: clocksource-switch: sort includes
selftests: timers: clocksource-switch: fix passing errors from child
selftests: timers: inconsistency-check: adapt to kselftest framework
selftests: timers: nanosleep: adapt to kselftest framework
selftests: timers: fix declarations of main()
selftests: timers: valid-adjtimex: build fix for newer toolchains
Makefile: add headers_install to kselftest targets
selftests: drop KSFT_KHDR_INSTALL make target
selftests: stop using KSFT_KHDR_INSTALL
selftests: drop khdr make target
selftests: drivers/dma-buf: Improve message in selftest summary
selftests/kcmp: Make the test output consistent and clear
selftests:timers: globals don't need initialization to 0
selftests/drivers/gpu: Add error messages to drm_mm.sh
selftests/tpm2: increase timeout for kselftests
...
Linus Torvalds [Wed, 3 Aug 2022 02:34:45 +0000 (19:34 -0700)]
Merge tag 'linux-kselftest-kunit-5.20-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
"This consists of several fixes and an important feature to discourage
running KUnit tests on production systems. Running tests on a
production system could leave the system in a bad state.
Summary:
- Add a new taint type, TAINT_TEST to signal that a test has been
run.
This should discourage people from running these tests on
production systems, and to make it easier to tell if tests have
been run accidentally (by loading the wrong configuration, etc)
- Several documentation and tool enhancements and fixes"
* tag 'linux-kselftest-kunit-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (29 commits)
Documentation: KUnit: Fix example with compilation error
Documentation: kunit: Add CLI args for kunit_tool
kcsan: test: Add a .kunitconfig to run KCSAN tests
kunit: executor: Fix a memory leak on failure in kunit_filter_tests
clk: explicitly disable CONFIG_UML_PCI_OVER_VIRTIO in .kunitconfig
mmc: sdhci-of-aspeed: test: Use kunit_test_suite() macro
nitro_enclaves: test: Use kunit_test_suite() macro
thunderbolt: test: Use kunit_test_suite() macro
kunit: flatten kunit_suite*** to kunit_suite** in .kunit_test_suites
kunit: unify module and builtin suite definitions
selftest: Taint kernel when test module loaded
module: panic: Taint the kernel when selftest modules load
Documentation: kunit: fix example run_kunit func to allow spaces in args
Documentation: kunit: Cleanup run_wrapper, fix x-ref
kunit: test.h: fix a kernel-doc markup
kunit: tool: Enable virtio/PCI by default on UML
kunit: tool: make --kunitconfig repeatable, blindly concat
kunit: add coverage_uml.config to enable GCOV on UML
kunit: tool: refactor internal kconfig handling, allow overriding
kunit: tool: introduce --qemu_args
...
Linus Torvalds [Wed, 3 Aug 2022 02:24:24 +0000 (19:24 -0700)]
Merge tag 'docs-6.0' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This was a moderately busy cycle for documentation, but nothing
all that earth-shaking:
- More Chinese translations, and an update to the Italian
translations.
The Japanese, Korean, and traditional Chinese translations
are more-or-less unmaintained at this point, instead.
- Some build-system performance improvements.
- The removal of the archaic submitting-drivers.rst document,
with the movement of what useful material that remained into
other docs.
- Improvements to sphinx-pre-install to, hopefully, give more
useful suggestions.
- A number of build-warning fixes
Plus the usual collection of typo fixes, updates, and more"
* tag 'docs-6.0' of git://git.lwn.net/linux: (92 commits)
docs: efi-stub: Fix paths for x86 / arm stubs
Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8
Docs/zh_CN: Update the translation of pci to 5.19-rc8
Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8
Docs/zh_CN: Update the translation of usage to 5.19-rc8
Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8
Docs/zh_CN: Update the translation of sparse to 5.19-rc8
Docs/zh_CN: Update the translation of kasan to 5.19-rc8
Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8
doc:it_IT: align Italian documentation
docs: Remove spurious tag from admin-guide/mm/overcommit-accounting.rst
Documentation: process: Update email client instructions for Thunderbird
docs: ABI: correct QEMU fw_cfg spec path
doc/zh_CN: remove submitting-driver reference from docs
docs: zh_TW: align to submitting-drivers removal
docs: zh_CN: align to submitting-drivers removal
docs: ko_KR: howto: remove reference to removed submitting-drivers
docs: ja_JP: howto: remove reference to removed submitting-drivers
docs: it_IT: align to submitting-drivers removal
docs: process: remove outdated submitting-drivers.rst
...
Linus Torvalds [Wed, 3 Aug 2022 02:22:24 +0000 (19:22 -0700)]
Merge tag 'nolibc.2022.07.27a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
"This provides nolibc updates, perhaps most notably improved testing
via the 'cd tools/include/nolibc; make headers' command. This should
be considered a smoke test. More thorough testing is in the works"
* tag 'nolibc.2022.07.27a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
tools/nolibc: add a help target to list supported targets
tools/nolibc: make the default target build the headers
tools/nolibc: fix the makefile to also work as "make -C tools ..."
tools/nolibc/stdio: Add format attribute to enable printf warnings
tools/nolibc/stdlib: Support overflow checking for older compiler versions
Linus Torvalds [Wed, 3 Aug 2022 02:12:45 +0000 (19:12 -0700)]
Merge tag 'rcu.2022.07.26a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates
- Miscellaneous fixes
- Callback-offload updates, perhaps most notably a new
RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be
offloaded at boot time, regardless of kernel boot parameters.
This is useful to battery-powered systems such as ChromeOS and
Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot
parameter prevents offloaded callbacks from interfering with
real-time workloads and with energy-efficiency mechanisms
- Polled grace-period updates, perhaps most notably making these APIs
account for both normal and expedited grace periods
- Tasks RCU updates, perhaps most notably reducing the CPU overhead of
RCU tasks trace grace periods by more than a factor of two on a
system with 15,000 tasks.
The reduction is expected to increase with the number of tasks, so it
seems reasonable to hypothesize that a system with 150,000 tasks
might see a 20-fold reduction in CPU overhead
- Torture-test updates
- Updates that merge RCU's dyntick-idle tracking into context tracking,
thus reducing the overhead of transitioning to kernel mode from
either idle or nohz_full userspace execution for kernels that track
context independently of RCU.
This is expected to be helpful primarily for kernels built with
CONFIG_NO_HZ_FULL=y
* tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits)
rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
rcu: Diagnose extended sync_rcu_do_polled_gp() loops
rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
rcutorture: Test polled expedited grace-period primitives
rcu: Add polled expedited grace-period primitives
rcutorture: Verify that polled GP API sees synchronous grace periods
rcu: Make Tiny RCU grace periods visible to polled APIs
rcu: Make polled grace-period API account for expedited grace periods
rcu: Switch polled grace-period APIs to ->gp_seq_polled
rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty
rcu/nocb: Add option to opt rcuo kthreads out of RT priority
rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
rcu/nocb: Add an option to offload all CPUs on boot
rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call
rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order
rcu/nocb: Add/del rdp to iterate from rcuog itself
rcu/tree: Add comment to describe GP-done condition in fqs loop
rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
rcu/kvfree: Remove useless monitor_todo flag
rcu: Cleanup RCU urgency state for offline CPU
...
Linus Torvalds [Wed, 3 Aug 2022 00:45:14 +0000 (17:45 -0700)]
Merge tag 'v5.20-p1' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Make proc files report fips module name and version
Algorithms:
- Move generic SHA1 code into lib/crypto
- Implement Chinese Remainder Theorem for RSA
- Remove blake2s
- Add XCTR with x86/arm64 acceleration
- Add POLYVAL with x86/arm64 acceleration
- Add HCTR2
- Add ARIA
Drivers:
- Add support for new CCP/PSP device ID in ccp"
* tag 'v5.20-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (89 commits)
crypto: tcrypt - Remove the static variable initialisations to NULL
crypto: arm64/poly1305 - fix a read out-of-bound
crypto: hisilicon/zip - Use the bitmap API to allocate bitmaps
crypto: hisilicon/sec - fix auth key size error
crypto: ccree - Remove a useless dma_supported() call
crypto: ccp - Add support for new CCP/PSP device ID
crypto: inside-secure - Add missing MODULE_DEVICE_TABLE for of
crypto: hisilicon/hpre - don't use GFP_KERNEL to alloc mem during softirq
crypto: testmgr - some more fixes to RSA test vectors
cyrpto: powerpc/aes - delete the rebundant word "block" in comments
hwrng: via - Fix comment typo
crypto: twofish - Fix comment typo
crypto: rmd160 - fix Kconfig "its" grammar
crypto: keembay-ocs-ecc - Drop if with an always false condition
Documentation: qat: rewrite description
Documentation: qat: Use code block for qat sysfs example
crypto: lib - add module license to libsha1
crypto: lib - make the sha1 library optional
crypto: lib - move lib/sha1.c into lib/crypto/
crypto: fips - make proc files report fips module name and version
...
Linus Torvalds [Wed, 3 Aug 2022 00:31:35 +0000 (17:31 -0700)]
Merge tag 'random-6.0-rc1-for-linus' of git://git./linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"Though there's been a decent amount of RNG-related development during
this last cycle, not all of it is coming through this tree, as this
cycle saw a shift toward tackling early boot time seeding issues,
which took place in other trees as well.
Here's a summary of the various patches:
- The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot
option have been removed, as they overlapped with the more widely
supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and
"random.trust_cpu". This change allowed simplifying a bit of arch
code.
- x86's RDRAND boot time test has been made a bit more robust, with
RDRAND disabled if it's clearly producing bogus results. This would
be a tip.git commit, technically, but I took it through random.git
to avoid a large merge conflict.
- The RNG has long since mixed in a timestamp very early in boot, on
the premise that a computer that does the same things, but does so
starting at different points in wall time, could be made to still
produce a different RNG state. Unfortunately, the clock isn't set
early in boot on all systems, so now we mix in that timestamp when
the time is actually set.
- User Mode Linux now uses the host OS's getrandom() syscall to
generate a bootloader RNG seed and later on treats getrandom() as
the platform's RDRAND-like faculty.
- The arch_get_random_{seed_,}_long() family of functions is now
arch_get_random_{seed_,}_longs(), which enables certain platforms,
such as s390, to exploit considerable performance advantages from
requesting multiple CPU random numbers at once, while at the same
time compiling down to the same code as before on platforms like
x86.
- A small cleanup changing a cmpxchg() into a try_cmpxchg(), from
Uros.
- A comment spelling fix"
More info about other random number changes that come in through various
architecture trees in the full commentary in the pull request:
https://lore.kernel.org/all/
20220731232428.2219258-1-Jason@zx2c4.com/
* tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: correct spelling of "overwrites"
random: handle archrandom with multiple longs
um: seed rng using host OS rng
random: use try_cmpxchg in _credit_init_bits
timekeeping: contribute wall clock to rng on time change
x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
random: remove CONFIG_ARCH_RANDOM
Andrew Lunn [Sun, 31 Jul 2022 20:49:06 +0000 (22:49 +0200)]
ata: sata_mv: Fixes expected number of resources now IRQs are gone
The commit
a1a2b7125e10 ("of/platform: Drop static setup of IRQ
resource from DT core") stopped IRQ resources being available as
platform resources. This broke the sanity check for the expected
number of resources in the Marvell SATA driver which expected two
resources, the IO memory and the interrupt.
Change the sanity check to only expect the IO memory.
Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Fixes:
a1a2b7125e10 ("of/platform: Drop static setup of IRQ resource from DT core")
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Linus Torvalds [Tue, 2 Aug 2022 22:24:36 +0000 (15:24 -0700)]
Merge tag 'fsverity-for-linus' of git://git./fs/fscrypt/fscrypt
Pull fsverity update from Eric Biggers:
"Just a small documentation update to mention the btrfs support"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fs-verity: mention btrfs support
Linus Torvalds [Tue, 2 Aug 2022 22:21:18 +0000 (15:21 -0700)]
Merge tag 'integrity-v6.0' of git://git./linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"Aside from the one EVM cleanup patch, all the other changes are kexec
related.
On different architectures different keyrings are used to verify the
kexec'ed kernel image signature. Here are a number of preparatory
cleanup patches and the patches themselves for making the keyrings -
builtin_trusted_keyring, .machine, .secondary_trusted_keyring, and
.platform - consistent across the different architectures"
* tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
kexec, KEYS, s390: Make use of built-in and secondary keyring for signature verification
arm64: kexec_file: use more system keyrings to verify kernel image signature
kexec, KEYS: make the code in bzImage64_verify_sig generic
kexec: clean up arch_kexec_kernel_verify_sig
kexec: drop weak attribute from functions
kexec_file: drop weak attribute from functions
evm: Use IS_ENABLED to initialize .enabled
Linus Torvalds [Tue, 2 Aug 2022 22:12:13 +0000 (15:12 -0700)]
Merge tag 'safesetid-6.0' of https://github.com/micah-morton/linux
Pull SafeSetID updates from Micah Morton:
"This contains one commit that touches common kernel code, one that
adds functionality internal to the SafeSetID LSM code, and a few other
commits that only modify the SafeSetID LSM selftest.
The commit that touches common kernel code simply adds an LSM hook in
the setgroups() syscall that mirrors what is done for the existing LSM
hooks in the setuid() and setgid() syscalls. This commit combined with
the SafeSetID-specific one allow the LSM to filter setgroups() calls
according to configured rule sets in the same way that is already done
for setuid() and setgid()"
* tag 'safesetid-6.0' of https://github.com/micah-morton/linux:
LSM: SafeSetID: add setgroups() testing to selftest
LSM: SafeSetID: Add setgroups() security policy handling
security: Add LSM hook to setgroups() syscall
LSM: SafeSetID: add GID testing to selftest
LSM: SafeSetID: selftest cleanup and prepare for GIDs
LSM: SafeSetID: fix userns bug in selftest
Linus Torvalds [Tue, 2 Aug 2022 22:05:10 +0000 (15:05 -0700)]
Merge tag 'Smack-for-6.0' of https://github.com/cschaufler/smack-next
Pull msack updates from Casey Schaufler:
"Two minor code clean-ups for Smack.
One removes a touch of dead code and the other replaces an instance of
kzalloc + strncpy with kstrndup"
* tag 'Smack-for-6.0' of https://github.com/cschaufler/smack-next:
smack: Remove the redundant lsm_inode_alloc
smack: Replace kzalloc + strncpy with kstrndup
Linus Torvalds [Tue, 2 Aug 2022 21:58:58 +0000 (14:58 -0700)]
Merge tag 'lsm-pr-
20220801' of git://git./linux/kernel/git/pcmoore/lsm
Pull LSM update from Paul Moore:
"A maintainer change for the LSM layer: James has asked me to take over
the day-to-day responsibilities so a single patch to update the
MAINTAINER info"
* tag 'lsm-pr-
20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
MAINTAINERS: update the LSM maintainer info
Linus Torvalds [Tue, 2 Aug 2022 21:56:25 +0000 (14:56 -0700)]
Merge tag 'audit-pr-
20220801' of git://git./linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Two minor audit patches: on marks a function as static, the other
removes a redundant length check"
* tag 'audit-pr-
20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: make is_audit_feature_set() static
audit: remove redundant data_len check
Linus Torvalds [Tue, 2 Aug 2022 21:51:47 +0000 (14:51 -0700)]
Merge tag 'selinux-pr-
20220801' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"A relatively small set of patches for SELinux this time, eight patches
in total with really only one significant change.
The highlights are:
- Add support for proper labeling of memfd_secret anonymous inodes.
This will allow LSMs that implement the anonymous inode hooks to
apply security policy to memfd_secret() fds.
- Various small improvements to memory management: fixed leaks, freed
memory when needed, boundary checks.
- Hardened the selinux_audit_data struct with __randomize_layout.
- A minor documentation tweak to fix a formatting/style issue"
* tag 'selinux-pr-
20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: selinux_add_opt() callers free memory
selinux: Add boundary check in put_entry()
selinux: fix memleak in security_read_state_kernel()
docs: selinux: add '=' signs to kernel boot options
mm: create security context for memfd_secret inodes
selinux: fix typos in comments
selinux: drop unnecessary NULL check
selinux: add __randomize_layout to selinux_audit_data
Linus Torvalds [Tue, 2 Aug 2022 21:38:59 +0000 (14:38 -0700)]
Merge tag 'hardening-v5.20-rc1' of git://git./linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- Fix Sparse warnings with randomizd kstack (GONG, Ruiqi)
- Replace uintptr_t with unsigned long in usercopy (Jason A. Donenfeld)
- Fix Clang -Wforward warning in LKDTM (Justin Stitt)
- Fix comment to correctly refer to STRICT_DEVMEM (Lukas Bulwahn)
- Introduce dm-verity binding logic to LoadPin LSM (Matthias Kaehlcke)
- Clean up warnings and overflow and KASAN tests (Kees Cook)
* tag 'hardening-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
dm: verity-loadpin: Drop use of dm_table_get_num_targets()
kasan: test: Silence GCC 12 warnings
drivers: lkdtm: fix clang -Wformat warning
x86: mm: refer to the intended config STRICT_DEVMEM in a comment
dm: verity-loadpin: Use CONFIG_SECURITY_LOADPIN_VERITY for conditional compilation
LoadPin: Enable loading from trusted dm-verity devices
dm: Add verity helpers for LoadPin
stack: Declare {randomize_,}kstack_offset to fix Sparse warnings
lib: overflow: Do not define 64-bit tests on 32-bit
MAINTAINERS: Add a general "kernel hardening" section
usercopy: use unsigned long instead of uintptr_t
Linus Torvalds [Tue, 2 Aug 2022 21:36:19 +0000 (14:36 -0700)]
Merge tag 'execve-v5.20-rc1' of git://git./linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- Allow unsharing time namespace on vfork+exec (Andrei Vagin)
- Replace usage of deprecated kmap APIs (Fabio M. De Francesco)
- Fix spelling mistake (Zhang Jiaming)
* tag 'execve-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
exec: Call kmap_local_page() in copy_string_kernel()
exec: Fix a spelling mistake
selftests/timens: add a test for vfork+exit
fs/exec: allow to unshare a time namespace on vfork+exec
Linus Torvalds [Tue, 2 Aug 2022 21:34:03 +0000 (14:34 -0700)]
Merge tag 'seccomp-v5.20-rc1' of git://git./linux/kernel/git/kees/linux
Pull seccomp update from Kees Cook:
- Fix Clang build warning (YiFei Zhu)
* tag 'seccomp-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: Fix compile warning when CC=clang
Linus Torvalds [Tue, 2 Aug 2022 21:31:46 +0000 (14:31 -0700)]
Merge tag 'pstore-v5.20-rc1' of git://git./linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
- Migrate to modern acomp crypto interface (Ard Biesheuvel)
- Use better return type for "rcnt" (Dan Carpenter)
* tag 'pstore-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/zone: cleanup "rcnt" type
pstore: migrate to crypto acomp interface
Linus Torvalds [Tue, 2 Aug 2022 21:21:25 +0000 (14:21 -0700)]
Merge tag 'for-6.0/dm-changes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Refactor DM core's mempool allocation so that it clearer by not being
split acorss files.
- Improve DM core's BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling.
- Optimize DM core's more common bio splitting by eliminating the use
of bio cloning with bio_split+bio_chain. Shift that cloning cost to
the relatively unlikely dm_io requeue case that only occurs during
error handling. Introduces dm_io_rewind() that will clone a bio that
reflects the subset of the original bio that must be requeued.
- Remove DM core's dm_table_get_num_targets() wrapper and audit all
dm_table_get_target() callers.
- Fix potential for OOM with DM writecache target by setting a default
MAX_WRITEBACK_JOBS (set to 256MiB or 1/16 of total system memory,
whichever is smaller).
- Fix DM writecache target's stats that are reported through
DM-specific table info.
- Fix use-after-free crash in dm_sm_register_threshold_callback().
- Refine DM core's Persistent Reservation handling in preparation for
broader work Mike Christie is doing to add compatibility with
Microsoft Windows Failover Cluster.
- Fix various KASAN reported bugs in the DM raid target.
- Fix DM raid target crash due to md_handle_request() bio splitting
that recurses to block core without properly initializing the bio's
bi_dev.
- Fix some code comment typos and fix some Documentation formatting.
* tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits)
dm: fix dm-raid crash if md_handle_request() splits bio
dm raid: fix address sanitizer warning in raid_resume
dm raid: fix address sanitizer warning in raid_status
dm: Start pr_preempt from the same starting path
dm: Fix PR release handling for non All Registrants
dm: Start pr_reserve from the same starting path
dm: Allow dm_call_pr to be used for path searches
dm: return early from dm_pr_call() if DM device is suspended
dm thin: fix use-after-free crash in dm_sm_register_threshold_callback
dm writecache: count number of blocks discarded, not number of discard bios
dm writecache: count number of blocks written, not number of write bios
dm writecache: count number of blocks read, not number of read bios
dm writecache: return void from functions
dm kcopyd: use __GFP_HIGHMEM when allocating pages
dm writecache: set a default MAX_WRITEBACK_JOBS
Documentation: dm writecache: Render status list as list
Documentation: dm writecache: add blank line before optional parameters
dm snapshot: fix typo in snapshot_map() comment
dm raid: remove redundant "the" in parse_raid_params() comment
dm cache: fix typo in 2 comment blocks
...
Jakub Kicinski [Tue, 2 Aug 2022 20:47:52 +0000 (13:47 -0700)]
Merge branch 'wireguard-patches-for-5-20-rc1'
Jason A. Donenfeld says:
====================
wireguard patches for 5.20-rc1
I had planned to send these out eventually as net.git patches, but as
you emailed earlier, I figure there's no harm in just doing this now for
net-next.git. Please apply the following small fixes:
1) Rather than using msleep() in order to approximate ktime_get_coarse_
boottime_ns(), instead use an hrtimer, rounded heuristically.
2) An update in selftest config fragments, from Lukas.
3) Linus noticed that a debugging WARN_ON() to detect (impossible) stack
corruption would still allow the corruption to happen, making it harder
to get the report about the corruption subsequently.
4) Support for User Mode Linux in the test suite. This depends on some
UML patches that are slated for 5.20. Richard hasn't sent his pull
in, but they're in his tree, so I assume it'll happen.
====================
Link: https://lore.kernel.org/r/20220802125613.340848-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Tue, 2 Aug 2022 12:56:13 +0000 (14:56 +0200)]
wireguard: selftests: support UML
This shoud open up various possibilities like time travel execution, and
is also just another platform to help shake out bugs.
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Tue, 2 Aug 2022 12:56:12 +0000 (14:56 +0200)]
wireguard: allowedips: don't corrupt stack when detecting overflow
In case push_rcu() and related functions are buggy, there's a
WARN_ON(len >= 128), which the selftest tries to hit by being tricky. In
case it is hit, we shouldn't corrupt the kernel's stack, though;
otherwise it may be hard to even receive the report that it's buggy. So
conditionalize the stack write based on that WARN_ON()'s return value.
Note that this never *actually* happens anyway. The WARN_ON() in the
first place is bounded by IS_ENABLED(DEBUG), and isn't expected to ever
actually hit. This is just a debugging sanity check.
Additionally, hoist the constant 128 into a named enum,
MAX_ALLOWEDIPS_BITS, so that it's clear why this value is chosen.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wjJZGA6w_DxA+k7Ejbqsq+uGK==koPai3sqdsfJqemvag@mail.gmail.com/
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Bulwahn [Tue, 2 Aug 2022 12:56:11 +0000 (14:56 +0200)]
wireguard: selftests: update config fragments
The kernel.config and debug.config fragments in wireguard selftests mention
some config symbols that have been reworked:
Commit
c5665868183f ("mm: kmemleak: use the memory pool for early
allocations") removes the config DEBUG_KMEMLEAK_EARLY_LOG_SIZE and since
then, the config's feature is available without further configuration.
Commit
4675ff05de2d ("kmemcheck: rip it out") removes kmemcheck and the
corresponding arch config HAVE_ARCH_KMEMCHECK. There is no need for this
config.
Commit
3bf195ae6037 ("netfilter: nat: merge nf_nat_ipv4,6 into nat core")
removes the config NF_NAT_IPV4 and since then, the config's feature is
available without further configuration.
Commit
41a2901e7d22 ("rcu: Remove SPARSE_RCU_POINTER Kconfig option")
removes the config SPARSE_RCU_POINTER and since then, the config's feature
is enabled by default.
Commit
dfb4357da6dd ("time: Remove CONFIG_TIMER_STATS") removes the feature
and config CONFIG_TIMER_STATS without any replacement.
Commit
3ca17b1f3628 ("lib/ubsan: remove null-pointer checks") removes the
check and config UBSAN_NULL without any replacement.
Adjust the config fragments to those changes in configs.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Tue, 2 Aug 2022 12:56:10 +0000 (14:56 +0200)]
wireguard: ratelimiter: use hrtimer in selftest
Using msleep() is problematic because it's compared against
ratelimiter.c's ktime_get_coarse_boottime_ns(), which means on systems
with slow jiffies (such as UML's forced HZ=100), the result is
inaccurate. So switch to using schedule_hrtimeout().
However, hrtimer gives us access only to the traditional posix timers,
and none of the _COARSE variants. So now, rather than being too
imprecise like jiffies, it's too precise.
One solution would be to give it a large "range" value, but this will
still fire early on a loaded system. A better solution is to align the
timeout to the actual coarse timer, and then round up to the nearest
tick, plus change.
So add the timeout to the current coarse time, and then
schedule_hrtimer() until the absolute computed time.
This should hopefully reduce flakes in CI as well. Note that we keep the
retry loop in case the entire function is running behind, because the
test could still be scheduled out, by either the kernel or by the
hypervisor's kernel, in which case restarting the test and hoping to not
be scheduled out still helps.
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 2 Aug 2022 20:46:35 +0000 (13:46 -0700)]
Merge tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
- Improve the type checking of request flags (Bart)
- Ensure queue mapping for a single queues always picks the right queue
(Bart)
- Sanitize the io priority handling (Jan)
- rq-qos race fix (Jinke)
- Reserved tags handling improvements (John)
- Separate memory alignment from file/disk offset aligment for O_DIRECT
(Keith)
- Add new ublk driver, userspace block driver using io_uring for
communication with the userspace backend (Ming)
- Use try_cmpxchg() to cleanup the code in various spots (Uros)
- Finally remove bdevname() (Christoph)
- Clean up the zoned device handling (Christoph)
- Clean up independent access range support (Christoph)
- Clean up and improve block sysfs handling (Christoph)
- Clean up and improve teardown of block devices.
This turns the usual two step process into something that is simpler
to implement and handle in block drivers (Christoph)
- Clean up chunk size handling (Christoph)
- Misc cleanups and fixes (Bart, Bo, Dan, GuoYong, Jason, Keith, Liu,
Ming, Sebastian, Yang, Ying)
* tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block: (178 commits)
ublk_drv: fix double shift bug
ublk_drv: make sure that correct flags(features) returned to userspace
ublk_drv: fix error handling of ublk_add_dev
ublk_drv: fix lockdep warning
block: remove __blk_get_queue
block: call blk_mq_exit_queue from disk_release for never added disks
blk-mq: fix error handling in __blk_mq_alloc_disk
ublk: defer disk allocation
ublk: rewrite ublk_ctrl_get_queue_affinity to not rely on hctx->cpumask
ublk: fold __ublk_create_dev into ublk_ctrl_add_dev
ublk: cleanup ublk_ctrl_uring_cmd
ublk: simplify ublk_ch_open and ublk_ch_release
ublk: remove the empty open and release block device operations
ublk: remove UBLK_IO_F_PREFLUSH
ublk: add a MAINTAINERS entry
block: don't allow the same type rq_qos add more than once
mmc: fix disk/queue leak in case of adding disk failure
ublk_drv: fix an IS_ERR() vs NULL check
ublk: remove UBLK_IO_F_INTEGRITY
ublk_drv: remove unneeded semicolon
...
Linus Torvalds [Tue, 2 Aug 2022 20:37:55 +0000 (13:37 -0700)]
Merge tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block
Pull io_uring zerocopy support from Jens Axboe:
"This adds support for efficient support for zerocopy sends through
io_uring. Both ipv4 and ipv6 is supported, as well as both TCP and
UDP.
The core network changes to support this is in a stable branch from
Jakub that both io_uring and net-next has pulled in, and the io_uring
changes are layered on top of that.
All of the work has been done by Pavel"
* tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block: (34 commits)
io_uring: notification completion optimisation
io_uring: export req alloc from core
io_uring/net: use unsigned for flags
io_uring/net: make page accounting more consistent
io_uring/net: checks errors of zc mem accounting
io_uring/net: improve io_get_notif_slot types
selftests/io_uring: test zerocopy send
io_uring: enable managed frags with register buffers
io_uring: add zc notification flush requests
io_uring: rename IORING_OP_FILES_UPDATE
io_uring: flush notifiers after sendzc
io_uring: sendzc with fixed buffers
io_uring: allow to pass addr into sendzc
io_uring: account locked pages for non-fixed zc
io_uring: wire send zc request type
io_uring: add notification slot registration
io_uring: add rsrc referencing for notifiers
io_uring: complete notifiers in tw
io_uring: cache struct io_notif
io_uring: add zc notification infrastructure
...
Linus Torvalds [Tue, 2 Aug 2022 20:27:23 +0000 (13:27 -0700)]
Merge tag 'for-5.20/io_uring-buffered-writes-2022-07-29' of git://git.kernel.dk/linux-block
Pull io_uring buffered writes support from Jens Axboe:
"This contains support for buffered writes, specifically for XFS. btrfs
is in progress, will be coming in the next release.
io_uring does support buffered writes on any file type, but since the
buffered write path just always -EAGAIN (or -EOPNOTSUPP) any attempt
to do so if IOCB_NOWAIT is set, any buffered write will effectively be
handled by io-wq offload. This isn't very efficient, and we even have
specific code in io-wq to serialize buffered writes to the same inode
to avoid further inefficiencies with thread offload.
This is particularly sad since most buffered writes don't block, they
simply copy data to a page and dirty it. With this pull request, we
can handle buffered writes a lot more effiently.
If balance_dirty_pages() needs to block, we back off on writes as
indicated.
This improves buffered write support by 2-3x.
Jan Kara helped with the mm bits for this, and Stefan handled the
fs/iomap/xfs/io_uring parts of it"
* tag 'for-5.20/io_uring-buffered-writes-2022-07-29' of git://git.kernel.dk/linux-block:
mm: honor FGP_NOWAIT for page cache page allocation
xfs: Add async buffered write support
xfs: Specify lockmode when calling xfs_ilock_for_iomap()
io_uring: Add tracepoint for short writes
io_uring: fix issue with io_write() not always undoing sb_start_write()
io_uring: Add support for async buffered writes
fs: Add async write file modification handling.
fs: Split off inode_needs_update_time and __file_update_time
fs: add __remove_file_privs() with flags parameter
fs: add a FMODE_BUF_WASYNC flags for f_mode
iomap: Return -EAGAIN from iomap_write_iter()
iomap: Add async buffered write support
iomap: Add flags parameter to iomap_page_create()
mm: Add balance_dirty_pages_ratelimited_flags() function
mm: Move updates of dirty_exceeded into one place
mm: Move starting of background writeback into the main balancing loop
Linus Torvalds [Tue, 2 Aug 2022 20:20:44 +0000 (13:20 -0700)]
Merge tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
- As per (valid) complaint in the last merge window, fs/io_uring.c has
grown quite large these days. io_uring isn't really tied to fs
either, as it supports a wide variety of functionality outside of
that.
Move the code to io_uring/ and split it into files that either
implement a specific request type, and split some code into helpers
as well. The code is organized a lot better like this, and io_uring.c
is now < 4K LOC (me).
- Deprecate the epoll_ctl opcode. It'll still work, just trigger a
warning once if used. If we don't get any complaints on this, and I
don't expect any, then we can fully remove it in a future release
(me).
- Improve the cancel hash locking (Hao)
- kbuf cleanups (Hao)
- Efficiency improvements to the task_work handling (Dylan, Pavel)
- Provided buffer improvements (Dylan)
- Add support for recv/recvmsg multishot support. This is similar to
the accept (or poll) support for have for multishot, where a single
SQE can trigger everytime data is received. For applications that
expect to do more than a few receives on an instantiated socket, this
greatly improves efficiency (Dylan).
- Efficiency improvements for poll handling (Pavel)
- Poll cancelation improvements (Pavel)
- Allow specifiying a range for direct descriptor allocations (Pavel)
- Cleanup the cqe32 handling (Pavel)
- Move io_uring types to greatly cleanup the tracing (Pavel)
- Tons of great code cleanups and improvements (Pavel)
- Add a way to do sync cancelations rather than through the sqe -> cqe
interface, as that's a lot easier to use for some use cases (me).
- Add support to IORING_OP_MSG_RING for sending direct descriptors to a
different ring. This avoids the usually problematic SCM case, as we
disallow those. (me)
- Make the per-command alloc cache we use for apoll generic, place
limits on it, and use it for netmsg as well (me).
- Various cleanups (me, Michal, Gustavo, Uros)
* tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block: (172 commits)
io_uring: ensure REQ_F_ISREG is set async offload
net: fix compat pointer in get_compat_msghdr()
io_uring: Don't require reinitable percpu_ref
io_uring: fix types in io_recvmsg_multishot_overflow
io_uring: Use atomic_long_try_cmpxchg in __io_account_mem
io_uring: support multishot in recvmsg
net: copy from user before calling __get_compat_msghdr
net: copy from user before calling __copy_msghdr
io_uring: support 0 length iov in buffer select in compat
io_uring: fix multishot ending when not polled
io_uring: add netmsg cache
io_uring: impose max limit on apoll cache
io_uring: add abstraction around apoll cache
io_uring: move apoll cache to poll.c
io_uring: consolidate hash_locked io-wq handling
io_uring: clear REQ_F_HASH_LOCKED on hash removal
io_uring: don't race double poll setting REQ_F_ASYNC_DATA
io_uring: don't miss setting REQ_F_DOUBLE_POLL
io_uring: disable multishot recvmsg
io_uring: only trace one of complete or overflow
...
Linus Torvalds [Tue, 2 Aug 2022 19:47:31 +0000 (12:47 -0700)]
Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
"Only updating the turbostat tool here, no kernel changes"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 2022.07.28
tools/power turbostat: do not decode ACC for ICX and SPR
tools/power turbostat: fix SPR PC6 limits
tools/power turbostat: cleanup 'automatic_cstate_conversion_probe()'
tools/power turbostat: separate SPR from ICX
tools/power turbosstat: fix comment
tools/power turbostat: Support RAPTORLAKE P
tools/power turbostat: add support for ALDERLAKE_N
tools/power turbostat: dump secondary Turbo-Ratio-Limit
tools/power turbostat: simplify dump_turbo_ratio_limits()
tools/power turbostat: dump CPUID.7.EDX.Hybrid
tools/power turbostat: update turbostat.8
tools/power turbostat: Show uncore frequency
tools/power turbostat: Fix file pointer leak
tools/power turbostat: replace strncmp with single character compare
tools/power turbostat: print the kernel boot commandline
tools/power turbostat: Introduce support for RaptorLake
Linus Torvalds [Tue, 2 Aug 2022 18:27:53 +0000 (11:27 -0700)]
Merge tag 'thermal-5.20-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These start a rework of the handling of trip points in the thermal
core, improve the cpufreq/devfreq cooling device handling, update some
thermal control drivers and the tmon utility and clean up code.
Specifics:
- Consolidate the thermal core code by beginning to move the thermal
trip structure from the thermal OF code as a generic structure to
be used by the different sensors when registering a thermal zone
(Daniel Lezcano).
- Make per cpufreq / devfreq cooling device ops instead of using a
global variable, fix comments and rework the trace information
(Lukasz Luba).
- Add the include/dt-bindings/thermal.h under the area covered by the
thermal maintainer in the MAINTAINERS file (Lukas Bulwahn).
- Improve the error output by giving the sensor identification when a
thermal zone failed to initialize, the DT bindings by changing the
positive logic and adding the r8a779f0 support on the rcar3
(Wolfram Sang).
- Convert the QCom tsens DT binding to the dtsformat format
(Krzysztof Kozlowski).
- Remove the pointless get_trend() function in the QCom, Ux500 and
tegra thermal drivers, along with the unused DROP_FULL and
RAISE_FULL trends definitions. Simplify the code by using clamp()
macros (Daniel Lezcano).
- Fix ref_table memory leak at probe time on the k3_j72xx bandgap
(Bryan Brattlof).
- Fix array underflow in prep_lookup_table (Dan Carpenter).
- Add static annotation to the k3_j72xx_bandgap_j7* data structure
(Jin Xiaoyun).
- Fix typos in comments detected on sun8i by Coccinelle (Julia
Lawall).
- Fix typos in comments on rzg2l (Biju Das).
- Remove as unnecessary call to dev_err() as the error is already
printed by the failing function on u8500 (Yang Li).
- Register the thermal zones as hwmon sensors for the Qcom thermal
sensors (Dmitry Baryshkov).
- Fix 'tmon' tool compilation issue by adding phtread.h include
(Markus Mayer).
- Fix typo in the comments for the 'tmon' tool (Slark Xiao).
- Make the thermal core use ida_alloc()/free() directly instead of
ida_simple_get()/ida_simple_remove() that have been deprecated
(keliu).
- Drop ACPI_FADT_LOW_POWER_S0 check from the Intel PCH thermal
control driver (Rafael Wysocki)"
* tag 'thermal-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits)
thermal/of: Initialize trip points separately
thermal/of: Use thermal trips stored in the thermal zone
thermal/core: Add thermal_trip in thermal_zone
thermal/core: Rename 'trips' to 'num_trips'
thermal/core: Move thermal_set_delay_jiffies to static
thermal/core: Remove unneeded EXPORT_SYMBOLS
thermal/of: Move thermal_trip structure to thermal.h
thermal/of: Remove the device node pointer for thermal_trip
thermal/of: Replace device node match with device node search
thermal/core: Remove duplicate information when an error occurs
thermal/core: Avoid calling ->get_trip_temp() unnecessarily
thermal/tools/tmon: Fix typo 'the the' in comment
thermal/tools/tmon: Include pthread and time headers in tmon.h
thermal/ti-soc-thermal: Fix comment typo
thermal/drivers/qcom/spmi-adc-tm5: Register thermal zones as hwmon sensors
thermal/drivers/qcom/temp-alarm: Register thermal zones as hwmon sensors
thermal/drivers/u8500: Remove unnecessary print function dev_err()
thermal/drivers/rzg2l: Fix comments
thermal/drivers/sun8i: Fix typo in comment
thermal/drivers/k3_j72xx_bandgap: Make k3_j72xx_bandgap_j721e_data and k3_j72xx_bandgap_j7200_data static
...
Linus Torvalds [Tue, 2 Aug 2022 18:17:00 +0000 (11:17 -0700)]
Merge tag 'pm-5.20-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These are mostly minor improvements all over including new CPU IDs for
the Intel RAPL driver, an Energy Model rework to use micro-Watt as the
power unit, cpufreq fixes and cleanus, cpuidle updates, devfreq
updates, documentation cleanups and a new version of the pm-graph
suite of utilities.
Specifics:
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplug locking from store() used by cpufreq
sysfs attributes (Viresh Kumar).
- Make the ACPI cpufreq driver support the boost control interface on
Zhaoxin/Centaur processors (Tony W Wang-oc).
- Print a warning message on attempts to free an active cpufreq
policy which should never happen (Viresh Kumar).
- Fix grammar in the Kconfig help text for the loongson2 cpufreq
driver (Randy Dunlap).
- Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
governor (Zhao Liu).
- Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
cpuidle driver (Eiichi Tsukata).
- Modify intel_idle to treat C1 and C1E as independent idle states on
Sapphire Rapids (Artem Bityutskiy).
- Extend support for wakeirq to callback wrappers used during system
suspend and resume (Ulf Hansson).
- Defer waiting for device probe before loading a hibernation image
till the first actual device access to avoid possible deadlocks
reported by syzbot (Tetsuo Handa).
- Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
Helgaas).
- Add Raptor Lake-P to the list of processors supported by the Intel
RAPL driver (George D Sworo).
- Add Alder Lake-N and Raptor Lake-P to the list of processors for
which Power Limit4 is supported in the Intel RAPL driver (Sumeet
Pawnikar).
- Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
attempting to remove it (Hsin-Yi Wang).
- Change the Energy Model code to represent power in micro-Watts and
adjust its users accordingly (Lukasz Luba).
- Add new devfreq driver for Mediatek CCI (Cache Coherent
Interconnect) (Johnson Wang).
- Convert the Samsung Exynos SoC Bus bindings to DT schema of
exynos-bus.c (Krzysztof Kozlowski).
- Address kernel-doc warnings by adding the description for unused
function parameters in devfreq core (Mauro Carvalho Chehab).
- Use NULL to pass a null pointer rather than zero according to the
function propotype in imx-bus.c (Colin Ian King).
- Print error message instead of error interger value in
tegra30-devfreq.c (Dmitry Osipenko).
- Add checks to prevent setting negative frequency QoS limits for
CPUs (Shivnandan Kumar).
- Update the pm-graph suite of utilities to the latest revision 5.9
including multiple improvements (Todd Brandt).
- Drop pme_interrupt reference from the PCI power management
documentation (Mario Limonciello)"
* tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (27 commits)
powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
PM: QoS: Add check to make sure CPU freq is non-negative
PM: hibernate: defer device probing when resuming from hibernation
intel_idle: make SPR C1 and C1E be independent
cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask
cpufreq: loongson2: fix Kconfig "its" grammar
pm-graph v5.9
cpufreq: Warn users while freeing active policy
cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
firmware: arm_scmi: Get detailed power scale from perf
Documentation: EM: Switch to micro-Watts scale
PM: EM: convert power field to micro-Watts precision and align drivers
PM / devfreq: tegra30: Add error message for devm_devfreq_add_device()
PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero
PM / devfreq: shut up kernel-doc warnings
dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema
PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver
dt-bindings: interconnect: Add MediaTek CCI dt-bindings
PM: domains: Ensure genpd_debugfs_dir exists before remove
PM: runtime: Extend support for wakeirq for force_suspend|resume
...
Linus Torvalds [Tue, 2 Aug 2022 18:12:25 +0000 (11:12 -0700)]
Merge tag 'acpi-5.20-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These rework the handling of ACPI device objects to use the driver
core facilities for managing child ones instead of some questionable
home-grown ways without the requisite locking and reference counting,
clean up the EC driver, improve suspend-to-idle handling on x86, add
some systems to the ACPI backlight quirk list, fix some assorted
issues, clean up code and improve documentation.
Specifics:
- Use facilities provided by the driver core and some additional
helpers to handle the children of a given ACPI device object in
multiple places instead of using the children and node list heads
in struct acpi_device which is error prone (Rafael Wysocki).
- Fix ACPI-related device reference counting issue in the hisi_lpc
bus driver (Yang Yingliang).
- Drop the children and node list heads that are not needed any more
from struct acpi_device (Rafael Wysocki).
- Drop driver member from struct acpi_device (Uwe Kleine-König).
- Drop redundant check from acpi_device_remove() (Uwe Kleine-König).
- Prepare the CPPC library for handling backwards-compatible future
_CPC return package formats gracefully (Rafael Wysocki).
- Clean up the ACPI EC driver after previous changes in it (Hans de
Goede).
- Drop leftover acpi_processor_get_limit_info() declaration (Riwen
Lu).
- Split out thermal initialization from ACPI PSS (Riwen Lu).
- Annotate more functions in the ACPI CPU idle driver to live in the
cpuidle section (Guilherme G. Piccoli).
- Fix _EINJ vs "special purpose" EFI memory regions (Dan Williams).
- Implement a better fix to avoid spamming the console with old error
logs (Tony Luck).
- Fix typo in a comment in the APEI code (Xiang wangx).
- Save NVS memory during transitions into S3 on Lenovo G40-45 (Manyi
Li).
- Add support for upcoming AMD uPEP device ID AMDI008 to the ACPI
suspend-to-idle driver for x86 platforms (Shyam Sundar S K).
- Clean up checks related to the ACPI_FADT_LOW_POWER_S0 platform flag
in the LPIT table driver and the suspend-to-idle driver for x86
platforms (Rafael Wysocki).
- Print information messages regarding declared LPS0 idle support in
the platform firmware (Rafael Wysocki).
- Fix missing check in register_device_clock() in the ACPI driver for
Intel SoCs (huhai).
- Fix ACS setup in the VIOT table parser (Eric Auger).
- Skip IRQ override on AMD Zen platforms where it's harmful
(Chuanhong Guo).
- Use native backlight on Dell Inspiron N4010 (Hans de Goede).
- Use native backlight on some TongFang devices (Werner Sembach).
- Drop X86 dependency from the ACPI backlight driver Kconfig (Riwen
Lu).
- Shorten the quirk list in the ACPI backlight driver by identifying
Clevo by board_name only (Werner Sembach).
- Remove useless NULL pointer checks from 2 ACPI PCI link management
functions (Andrey Strachuk).
- Fix obsolete example in the ACPI EINJ documentation (Qifu Zhang).
- Update links and references to _DSD-related documents (Sudeep
Holla)"
* tag 'acpi-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (46 commits)
ACPI/PCI: Remove useless NULL pointer checks
ACPI: CPPC: Do not prevent CPPC from working in the future
ACPI: PM: x86: Print messages regarding LPS0 idle support
ACPI: resource: skip IRQ override on AMD Zen platforms
Documentation: ACPI: EINJ: Fix obsolete example
ACPI: video: Use native backlight on Dell Inspiron N4010
ACPI: PM: s2idle: Use LPS0 idle if ACPI_FADT_LOW_POWER_S0 is unset
Revert "ACPI / PM: LPIT: Register sysfs attributes based on FADT"
ACPI: video: Shortening quirk list by identifying Clevo by board_name only
ACPI: video: Force backlight native for some TongFang devices
ACPI: PM: s2idle: Add support for upcoming AMD uPEP HID AMDI008
ACPI: VIOT: Fix ACS setup
ACPI: bus: Drop unused list heads from struct acpi_device
hisi_lpc: Use acpi_dev_for_each_child()
bus: hisi_lpc: fix missing platform_device_put() in hisi_lpc_acpi_probe()
ACPI: bus: Drop driver member of struct acpi_device
ACPI: bus: Drop redundant check in acpi_device_remove()
ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP
ACPI: LPSS: Fix missing check in register_device_clock()
ACPI: APEI: Better fix to avoid spamming the console with old error logs
...
Linus Torvalds [Tue, 2 Aug 2022 18:07:04 +0000 (11:07 -0700)]
Merge tag 'hwmon-for-v5.20' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
- Substantial rewrite of lm90 driver to support several additional
chips and improve support for existing chips.
- Add support of ROG ZENITH II EXTREME, Maximus XI Hero, and
Strix Z690-a D4 to asus-ec-sensors driver
- Add support of
F71858AD to f71882fg driver
- Add support of Aquacomputer Quadro to aquacomputer_d5next driver
- Improved assembler code and add support for Dell G5 5590 as well as
XPS 13 7390 in dell-smm driver
- Add support for ASUS TUF GAMING B550-PLUS WIFI II to nct775 driver
- Add support for IEEE 754 half precision to PMBus core. Also support
for Analog Devices LT7182S, improve regulator support, and report
various MFR register values in debugfs.
- Various other minor improvements and fixes
* tag 'hwmon-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (85 commits)
hwmon: (aquacomputer_d5next) Add support for Aquacomputer Quadro fan controller
hwmon: (dell-smm) Improve documentation
hwmon: (nct6775) add ASUS TUF GAMING B550-PLUS WIFI II
hwmon: (occ) Replace open-coded variant of %*phN specifier
hwmon: (sht15) Fix wrong assumptions in device remove callback
hwmon: (aquacomputer_d5next) Add support for reading the +12V voltage sensor on D5 Next
hwmon: (tps23861) fix byte order in current and voltage registers
hwmon: (aspeed-pwm-tacho) increase fan tach period (again)
hwmon: (aquacomputer_d5next) Add D5 Next fan control support
hwmon: (mcp3021) improve driver support for newer hwmon interface
hwmon: (asus-ec-sensors) add definitions for ROG ZENITH II EXTREME
hwmon: (aquacomputer_d5next) Move device-specific data into struct aqc_data
hwmon: (asus-ec-sensors) add missing sensors for X570-I GAMING
hwmon: (drivetemp) Add module alias
hwmon: (asus_wmi_sensors) Save a few bytes of memory
hwmon: (lm90) Use worker for alarm notifications
hwmon: (asus-ec-sensors) add support for Maximus XI Hero
hwmon: (dell-smm) Improve assembly code
hwmon: (pmbus/ltc2978) Set voltage resolution
hwmon: (pmbus) Add list_voltage to pmbus ops
...
Linus Torvalds [Tue, 2 Aug 2022 18:04:41 +0000 (11:04 -0700)]
Merge tag 'pwm/for-5.20-rc1' of git://git./linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"After v5.19 had all drivers converted to the new atomic API and nobody
has reported any breakage, this set of changes starts by dropping the
legacy support.
Some existing drivers get improvements and broader chip support and a
new driver is added that emulates a PWM controller using a clock
output.
Other than that there's the usual bits of cleanups and minor fixes"
* tag 'pwm/for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (21 commits)
pwm: lpc18xx: Fix period handling
pwm: lpc18xx: Convert to use dev_err_probe()
pwm: twl-led: Document some limitations and link to the reference manual
MAINTAINERS: Remove myself as PWM maintainer
MAINTAINERS: Add include/dt-bindings/pwm to PWM SUBSYSTEM
dt-bindings: pwm: mediatek: Add compatible string for MT8195
pwm: Add clock based PWM output driver
dt-bindings: pwm: Document clk based PWM controller
pwm: sifive: Shut down hardware only after pwmchip_remove() completed
pwm: sifive: Ensure the clk is enabled exactly once per running PWM
pwm: sifive: Simplify clk handling
pwm: sifive: Enable clk only after period check in .apply()
pwm: sifive: Reduce time the controller lock is held
pwm: sifive: Fold pwm_sifive_enable() into its only caller
pwm: sifive: Simplify offset calculation for PWMCMP registers
pwm: mediatek: Add MT8365 support
dt-bindings: pwm: Add MT8365 SoC binding
pwm: Drop unused forward declaration from pwm.h
pwm: Reorder header file to get rid of struct pwm_capture forward declaration
pwm: atmel-tcb: Fix typo in comment
...
Linus Torvalds [Tue, 2 Aug 2022 17:55:04 +0000 (10:55 -0700)]
Merge tag 'spi-v5.20' of git://git./linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"The big update this time around is some excellent work from David
Jander who went through the fast path and really eliminated overheads,
meaning that we are seeing a huge reduction in the time spent between
transfers for single threaded clients.
Benchmarking has been coming out at about a halving of overhead which
is clearly visible in system level usage that stresses SPI like some
CAN and IIO applications, especially with small transfers. Thanks to
David for taking the time to drill down into this and push the work
upstream.
Otherwise there's been a bunch of new device support and the usual
updates.
- Optimisation of the fast path, particularly around the number and
types of locking operations, from David Jander.
- Support for Arbel NPCM845, HP GXP, Intel Meteor Lake and Thunder
Bay, MediaTek MT8188 and MT8365, Microchip FPGAs, nVidia Tegra 241
and Samsung Exynos Auto v9 and 4210"
* tag 'spi-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (97 commits)
MAINTAINERS: add spi support to GXP
spi: dt-bindings: add documentation for hpe,gxp-spifi
spi: spi-gxp: Add support for HPE GXP SoCs
spi: a3700: support BE for AC5 SPI driver
spi/panel: dt-bindings: drop CPHA and CPOL from common properties
spi: bcm2835: enable shared interrupt support
spi: dt-bindings: spi-controller: correct example indentation
spi: dt-bindings: qcom,spi-geni-qcom: allow three interconnects
spi: npcm-fiu: Add NPCM8XX support
dt-binding: spi: Add npcm845 compatible to npcm-fiu document
spi: npcm-fiu: Modify direct read dummy configuration
spi: atmel: remove #ifdef CONFIG_{PM, SLEEP}
spi: dt-bindings: Add compatible for MediaTek MT8188
spi: dt-bindings: mediatek,spi-mtk-nor: Update bindings for nor flash
spi: dt-bindings: atmel,at91rm9200-spi: convert to json-schema
spi: tegra20-slink: fix UAF in tegra_slink_remove()
spi: Fix simplification of devm_spi_register_controller
spi: microchip-core: switch to use dev_err_probe()
spi: microchip-core: switch to use devm_spi_alloc_master()
spi: microchip-core: fix UAF in mchp_corespi_remove()
...
Linus Torvalds [Tue, 2 Aug 2022 17:23:10 +0000 (10:23 -0700)]
Merge tag 'regulator-v5.20' of git://git./linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"This has been a fairly quiet release for the regulator API, a few new
drivers and a small API update:
- Support for specifying an initial load as part of requesting
regulators through the bulk API
- Support for Maxim MAX597x, Qualcomm PM8074, PM8909 and Realtek
RT5120 devices"
* tag 'regulator-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (35 commits)
regulator: core: Allow drivers to define their init data as const
regulator: core: Allow specifying an initial load w/ the bulk API
regulator: mt6380: Fix unused array warning
regulator: Add missing type for 'regulator-microvolt-offset'
regulator: core: Fix off-on-delay-us for always-on/boot-on regulators
regulator: of: Fix refcount leak bug in of_get_regulation_constraints()
regulator: pwm: Update Lee Jones' email address
regulator: max597x: Don't return uninitialized variable in .probe
regulator: qcom,spmi-regulator: add PMP8074 PMIC
regulator: qcom,spmi-regulator: Convert to dtschema
regulator: qcom_spmi: add support for PMP8074 regulators
regulator: qcom_spmi: add support for HT_P600
regulator: qcom_spmi: add support for HT_P150
regulator: max597x: Remove unused including <linux/version.h>
regulator: Fix MFD_MAX597X dependency
regulator: Fix parameter declaration and spelling mistake.
regulator: max597x: Add support for max597x regulator
regulator: scmi: Add missing of_node_get()
regulator: qcom_smd: Add PM8909 RPM regulators
regulator: dt-bindings: qcom,smd-rpm: Add PM8909
...
Linus Torvalds [Tue, 2 Aug 2022 17:12:25 +0000 (10:12 -0700)]
Merge tag 'regmap-v5.20' of git://git./linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"The big thing this release is a big cleanup of the interrupt code from
Aidan MacDonald, plus a few new API updates:
- Rework of the interrupt code, making it much simpler and easier to
extend
- Support for device specific update bits operations with devices
that otherwise use bitstream interfaces
- Support for bit operations on fields as well as whole registers"
* tag 'regmap-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: permit to set reg_update_bits with bulk implementation
regmap: add WARN_ONCE when invalid mask is provided to regmap_field_init()
regmap-irq: Fix bug in regmap_irq_get_irq_reg_linear()
regmap: cache: Add extra parameter check in regcache_init
regmap-irq: Deprecate the not_fixed_stride flag
regmap-irq: Add get_irq_reg() callback
regmap-irq: Fix inverted handling of unmask registers
regmap-irq: Deprecate type registers and virtual registers
regmap-irq: Introduce config registers for irq types
regmap-irq: Refactor checks for status bulk read support
regmap-irq: Remove mask_writeonly and regmap_irq_update_bits()
regmap-irq: Remove inappropriate uses of regmap_irq_update_bits()
regmap-irq: Remove an unnecessary restriction on type_in_mask
regmap-irq: Cleanup sizeof(...) use in memory allocation
regmap-irq: Remove unused type_reg_stride field
regmap-irq: Convert bool bitfields to unsigned int
regmap: Don't warn about cache only mode for devices with no cache
regmap: provide regmap_field helpers for simple bit operations
regmap: cache: Fix syntax errors in comments
Christoph Hellwig [Mon, 13 Jun 2022 05:37:15 +0000 (07:37 +0200)]
fs: remove the NULL get_block case in mpage_writepages
No one calls mpage_writepages with a NULL get_block paramter, so remove
support for that case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Christoph Hellwig [Mon, 13 Jun 2022 05:37:14 +0000 (07:37 +0200)]
fs: don't call ->writepage from __mpage_writepage
All callers of mpage_writepage use block_write_full_page as their
->writepage implementation when called from mpage_writepages
(although for ntfs3 this is obsfucated a bit).
Just call block_write_full_page directly instead of going through
the ->writepage indirection.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Christoph Hellwig [Mon, 13 Jun 2022 05:37:13 +0000 (07:37 +0200)]
fs: remove the nobh helpers
All callers are gone, so remove the now dead code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Christoph Hellwig [Mon, 13 Jun 2022 05:37:12 +0000 (07:37 +0200)]
jfs: stop using the nobh helper
The nobh mode is an obscure feature to save lowlevel for large memory
32-bit configurations while trading for much slower performance and
has been long obsolete. Switch to the regular buffer head based helpers
instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Christoph Hellwig [Mon, 13 Jun 2022 05:37:11 +0000 (07:37 +0200)]
ext2: remove nobh support
The nobh mode is an obscure feature to save lowlevel for large memory
32-bit configurations while trading for much slower performance and
has been long obsolete. Remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Christoph Hellwig [Mon, 13 Jun 2022 05:37:10 +0000 (07:37 +0200)]
ntfs3: refactor ntfs_writepages
Handle the resident case with an explicit generic_writepages call instead
of using the obscure overload that makes mpage_writepages with a NULL
get_block do the same thing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Matthew Wilcox (Oracle) [Mon, 6 Jun 2022 17:29:10 +0000 (13:29 -0400)]
mm/folio-compat: Remove migration compatibility functions
migrate_page_move_mapping(), migrate_page_copy() and migrate_page_states()
are all now unused after converting all the filesystems from
aops->migratepage() to aops->migrate_folio().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Matthew Wilcox (Oracle) [Mon, 6 Jun 2022 15:53:31 +0000 (11:53 -0400)]
fs: Remove aops->migratepage()
With all users converted to migrate_folio(), remove this operation.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Matthew Wilcox (Oracle) [Mon, 6 Jun 2022 15:30:43 +0000 (11:30 -0400)]
secretmem: Convert to migrate_folio
This is little more than changing the types over; there's no real work
being done in this function.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Matthew Wilcox (Oracle) [Mon, 6 Jun 2022 14:47:21 +0000 (10:47 -0400)]
hugetlb: Convert to migrate_folio
This involves converting migrate_huge_page_move_mapping(). We also need a
folio variant of hugetlb_set_page_subpool(), but that's for a later patch.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>