Hou Tao [Wed, 16 Nov 2022 07:50:58 +0000 (15:50 +0800)]
bpf: Pass map file to .map_update_batch directly
Currently bpf_map_do_batch() first invokes fdget(batch.map_fd) to get
the target map file, then it invokes generic_map_update_batch() to do
batch update. generic_map_update_batch() will get the target map file
by using fdget(batch.map_fd) again and pass it to bpf_map_update_value().
The problem is map file returned by the second fdget() may be NULL or a
totally different file compared by map file in bpf_map_do_batch(). The
reason is that the first fdget() only guarantees the liveness of struct
file instead of file descriptor and the file description may be released
by concurrent close() through pick_file().
It doesn't incur any problem as for now, because maps with batch update
support don't use map file in .map_fd_get_ptr() ops. But it is better to
fix the potential access of an invalid map file.
Using __bpf_map_get() again in generic_map_update_batch() can not fix
the problem, because batch.map_fd may be closed and reopened, and the
returned map file may be different with map file got in
bpf_map_do_batch(), so just passing the map file directly to
.map_update_batch() in bpf_map_do_batch().
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221116075059.1551277-1-houtao@huaweicloud.com
Daniel Müller [Wed, 16 Nov 2022 17:43:58 +0000 (17:43 +0000)]
bpf/docs: Include blank lines between bullet points in bpf_devel_QA.rst
Commit
26a9b433cf08 ("bpf/docs: Document how to run CI without patch
submission") caused a warning to be generated when compiling the
documentation:
> bpf_devel_QA.rst:55: WARNING: Unexpected indentation.
> bpf_devel_QA.rst:56: WARNING: Block quote ends without a blank line
This change fixes the problem by inserting the required blank lines.
Fixes:
26a9b433cf08 ("bpf/docs: Document how to run CI without patch submission")
Reported-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Link: https://lore.kernel.org/bpf/20221116174358.2744613-1-deso@posteo.net
Wang Yufen [Tue, 15 Nov 2022 03:29:40 +0000 (11:29 +0800)]
selftests/bpf: fix memory leak of lsm_cgroup
kmemleak reports this issue:
unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies
4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<
00000000376cdeab>] kmalloc_trace+0x27/0x110
[<
000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<
000000003959008f>] security_sk_alloc+0x47/0x80
[<
00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<
0000000002d6343a>] sk_alloc+0x3b/0x940
[<
000000009812a46d>] unix_create1+0x8f/0x3d0
[<
000000005ed0976b>] unix_create+0xa1/0x150
[<
0000000086a1d27f>] __sock_create+0x233/0x4a0
[<
00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<
0000000007c63f20>] __sys_socket+0x49/0xf0
[<
00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<
00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<
000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
The issue occurs in the following scenarios:
unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak
The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed
Fixes:
dca85aac8895 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Eduard Zingerman [Wed, 16 Nov 2022 01:54:56 +0000 (03:54 +0200)]
selftests/bpf: allow unpriv bpf for selftests by default
Enable unprivileged bpf for selftests kernel by default.
This forces CI to run test_verifier tests in both privileged
and unprivileged modes.
The test_verifier.c:do_test uses sysctl kernel.unprivileged_bpf_disabled
to decide whether to run or to skip test cases in unprivileged mode.
The CONFIG_BPF_UNPRIV_DEFAULT_OFF controls the default value of the
kernel.unprivileged_bpf_disabled.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221116015456.2461135-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tiezhu Yang [Tue, 15 Nov 2022 13:00:07 +0000 (21:00 +0800)]
bpftool: Check argc first before "file" in do_batch()
If the parameters for batch are more than 2, check argc first can
return immediately, no need to use is_prefix() to check "file" with
a little overhead and then check argc, it is better to check "file"
only when the parameters for batch are 2.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/1668517207-11822-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Donald Hunter [Tue, 15 Nov 2022 09:59:10 +0000 (09:59 +0000)]
docs/bpf: Fix sample code in MAP_TYPE_ARRAY docs
Remove mistaken & from code example in MAP_TYPE_ARRAY docs
Fixes:
1cfa97b30c5a ("bpf, docs: Document BPF_MAP_TYPE_ARRAY")
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20221115095910.86407-1-donald.hunter@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov [Wed, 16 Nov 2022 01:38:36 +0000 (17:38 -0800)]
Merge branch 'propagate nullness information for reg to reg comparisons'
Eduard Zingerman says:
====================
This patchset adds ability to propagates nullness information for
branches of register to register equality compare instructions. The
following rules are used:
- suppose register A maybe null
- suppose register B is not null
- for JNE A, B, ... - A is not null in the false branch
- for JEQ A, B, ... - A is not null in the true branch
E.g. for program like below:
r6 = skb->sk;
r7 = sk_fullsock(r6);
r0 = sk_fullsock(r6);
if (r0 == 0) return 0; (a)
if (r0 != r7) return 0; (b)
*r7->type; (c)
return 0;
It is safe to dereference r7 at point (c), because of (a) and (b).
The utility of this change came up while working on BPF CLang backend
issue [1]. Specifically, while debugging issue with selftest
`test_sk_lookup.c`. This test has the following structure:
int access_ctx_sk(struct bpf_sk_lookup *ctx __CTX__)
{
struct bpf_sock *sk1 = NULL, *sk2 = NULL;
...
sk1 = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
if (!sk1) // (a)
goto out;
...
if (ctx->sk != sk1) // (b)
goto out;
...
if (ctx->sk->family != AF_INET || // (c)
ctx->sk->type != SOCK_STREAM ||
ctx->sk->state != BPF_TCP_LISTEN)
goto out;
...
}
- at (a) `sk1` is checked to be not null;
- at (b) `ctx->sk` is verified to be equal to `sk1`;
- at (c) `ctx->sk` is accessed w/o nullness check.
Currently Global Value Numbering pass considers expressions `sk1` and
`ctx->sk` to be identical at point (c) and replaces `ctx->sk` with
`sk1` (not expressions themselves but corresponding SSA values).
Since `sk1` is known to be not null after (b) verifier allows
execution of the program.
However, such optimization is not guaranteed to happen. When it does
not happen verifier reports an error.
Changelog:
v2 -> v3:
- verifier tests are updated with correct error message for
unprivileged mode (pointer comparisons are forbidden in
unprivileged mode).
v1 -> v2:
- after investigation described in [2] as suggested by John, Daniel
and Shung-Hsi, function `type_is_pointer` is removed, calls to this
function are replaced by `__is_pointer_value(false, src_reg)`.
RFC -> v1:
- newly added if block in `check_cond_jmp_op` is moved down to keep
`make_ptr_not_null_reg` actions together;
- tests rewritten to have a single `r0 = 0; exit;` block.
[1] https://reviews.llvm.org/D131633#3722231
[2] https://lore.kernel.org/bpf/
bad8be826d088e0d180232628160bf932006de89.camel@gmail.com/
[RFC] https://lore.kernel.org/bpf/
20220822094312.175448-1-eddyz87@gmail.com/
[v1] https://lore.kernel.org/bpf/
20220826172915.1536914-1-eddyz87@gmail.com/
[v2] https://lore.kernel.org/bpf/
20221106214921.117631-1-eddyz87@gmail.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Eduard Zingerman [Tue, 15 Nov 2022 22:48:59 +0000 (00:48 +0200)]
selftests/bpf: check nullness propagation for reg to reg comparisons
Verify that nullness information is porpagated in the branches of
register to register JEQ and JNE operations.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221115224859.2452988-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Eduard Zingerman [Tue, 15 Nov 2022 22:48:58 +0000 (00:48 +0200)]
bpf: propagate nullness information for reg to reg comparisons
Propagate nullness information for branches of register to register
equality compare instructions. The following rules are used:
- suppose register A maybe null
- suppose register B is not null
- for JNE A, B, ... - A is not null in the false branch
- for JEQ A, B, ... - A is not null in the true branch
E.g. for program like below:
r6 = skb->sk;
r7 = sk_fullsock(r6);
r0 = sk_fullsock(r6);
if (r0 == 0) return 0; (a)
if (r0 != r7) return 0; (b)
*r7->type; (c)
return 0;
It is safe to dereference r7 at point (c), because of (a) and (b).
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221115224859.2452988-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Toke Høiland-Jørgensen [Tue, 8 Nov 2022 14:06:00 +0000 (15:06 +0100)]
bpf: Expand map key argument of bpf_redirect_map to u64
For queueing packets in XDP we want to add a new redirect map type with
support for 64-bit indexes. To prepare fore this, expand the width of the
'key' argument to the bpf_redirect_map() helper. Since BPF registers are
always 64-bit, this should be safe to do after the fact.
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221108140601.149971-3-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Toke Høiland-Jørgensen [Tue, 8 Nov 2022 14:05:59 +0000 (15:05 +0100)]
dev: Move received_rps counter next to RPS members in softnet data
Move the received_rps counter value next to the other RPS-related members
in softnet_data. This closes two four-byte holes in the structure, making
room for another pointer in the first two cache lines without bumping the
xmit struct to its own line.
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221108140601.149971-2-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Müller [Mon, 14 Nov 2022 21:15:01 +0000 (21:15 +0000)]
bpf/docs: Document how to run CI without patch submission
This change documents the process for running the BPF CI before
submitting a patch to the upstream mailing list, similar to what happens
if a patch is send to bpf@vger.kernel.org: it builds kernel and
selftests and runs the latter on different architecture (but it notably
does not cover stylistic checks such as cover letter verification).
Running BPF CI this way can help achieve better test coverage ahead of
patch submission than merely running locally (say, using
tools/testing/selftests/bpf/vmtest.sh), as additional architectures may
be covered as well.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221114211501.2068684-1-deso@posteo.net
Kumar Kartikeya Dwivedi [Mon, 14 Nov 2022 19:15:28 +0000 (00:45 +0530)]
bpf: Refactor btf_struct_access
Instead of having to pass multiple arguments that describe the register,
pass the bpf_reg_state into the btf_struct_access callback. Currently,
all call sites simply reuse the btf and btf_id of the reg they want to
check the access of. The only exception to this pattern is the callsite
in check_ptr_to_map_access, hence for that case create a dummy reg to
simulate PTR_TO_BTF_ID access.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Mon, 14 Nov 2022 19:15:27 +0000 (00:45 +0530)]
bpf: Rename MEM_ALLOC to MEM_RINGBUF
Currently, verifier uses MEM_ALLOC type tag to specially tag memory
returned from bpf_ringbuf_reserve helper. However, this is currently
only used for this purpose and there is an implicit assumption that it
only refers to ringbuf memory (e.g. the check for ARG_PTR_TO_ALLOC_MEM
in check_func_arg_reg_off).
Hence, rename MEM_ALLOC to MEM_RINGBUF to indicate this special
relationship and instead open the use of MEM_ALLOC for more generic
allocations made for user types.
Also, since ARG_PTR_TO_ALLOC_MEM_OR_NULL is unused, simply drop it.
Finally, update selftests using 'alloc_' verifier string to 'ringbuf_'.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Mon, 14 Nov 2022 19:15:26 +0000 (00:45 +0530)]
bpf: Rename RET_PTR_TO_ALLOC_MEM
Currently, the verifier has two return types, RET_PTR_TO_ALLOC_MEM, and
RET_PTR_TO_ALLOC_MEM_OR_NULL, however the former is confusingly named to
imply that it carries MEM_ALLOC, while only the latter does. This causes
confusion during code review leading to conclusions like that the return
value of RET_PTR_TO_DYNPTR_MEM_OR_NULL (which is RET_PTR_TO_ALLOC_MEM |
PTR_MAYBE_NULL) may be consumable by bpf_ringbuf_{submit,commit}.
Rename it to make it clear MEM_ALLOC needs to be tacked on top of
RET_PTR_TO_MEM.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Mon, 14 Nov 2022 19:15:25 +0000 (00:45 +0530)]
bpf: Support bpf_list_head in map values
Add the support on the map side to parse, recognize, verify, and build
metadata table for a new special field of the type struct bpf_list_head.
To parameterize the bpf_list_head for a certain value type and the
list_node member it will accept in that value type, we use BTF
declaration tags.
The definition of bpf_list_head in a map value will be done as follows:
struct foo {
struct bpf_list_node node;
int data;
};
struct map_value {
struct bpf_list_head head __contains(foo, node);
};
Then, the bpf_list_head only allows adding to the list 'head' using the
bpf_list_node 'node' for the type struct foo.
The 'contains' annotation is a BTF declaration tag composed of four
parts, "contains:name:node" where the name is then used to look up the
type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during
the lookup. The node defines name of the member in this type that has
the type struct bpf_list_node, which is actually used for linking into
the linked list. For now, 'kind' part is hardcoded as struct.
This allows building intrusive linked lists in BPF, using container_of
to obtain pointer to entry, while being completely type safe from the
perspective of the verifier. The verifier knows exactly the type of the
nodes, and knows that list helpers return that type at some fixed offset
where the bpf_list_node member used for this list exists. The verifier
also uses this information to disallow adding types that are not
accepted by a certain list.
For now, no elements can be added to such lists. Support for that is
coming in future patches, hence draining and freeing items is done with
a TODO that will be resolved in a future patch.
Note that the bpf_list_head_free function moves the list out to a local
variable under the lock and releases it, doing the actual draining of
the list items outside the lock. While this helps with not holding the
lock for too long pessimizing other concurrent list operations, it is
also necessary for deadlock prevention: unless every function called in
the critical section would be notrace, a fentry/fexit program could
attach and call bpf_map_update_elem again on the map, leading to the
same lock being acquired if the key matches and lead to a deadlock.
While this requires some special effort on part of the BPF programmer to
trigger and is highly unlikely to occur in practice, it is always better
if we can avoid such a condition.
While notrace would prevent this, doing the draining outside the lock
has advantages of its own, hence it is used to also fix the deadlock
related problem.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Mon, 14 Nov 2022 19:15:24 +0000 (00:45 +0530)]
bpf: Fix copy_map_value, zero_map_value
The current offset needs to also skip over the already copied region in
addition to the size of the next field. This case manifests where there
are gaps between adjacent special fields.
It was observed that for a map value with size 48, having fields at:
off: 0, 16, 32
size: 4, 16, 16
The current code does:
memcpy(dst + 0, src + 0, 0)
memcpy(dst + 4, src + 4, 12)
memcpy(dst + 20, src + 20, 12)
memcpy(dst + 36, src + 36, 12)
With the fix, it is done correctly as:
memcpy(dst + 0, src + 0, 0)
memcpy(dst + 4, src + 4, 12)
memcpy(dst + 32, src + 32, 0)
memcpy(dst + 48, src + 48, 0)
Fixes:
4d7d7f69f4b1 ("bpf: Adapt copy_map_value for multiple offset case")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Mon, 14 Nov 2022 19:15:23 +0000 (00:45 +0530)]
bpf: Remove BPF_MAP_OFF_ARR_MAX
In
f71b2f64177a ("bpf: Refactor map->off_arr handling"), map->off_arr
was refactored to be btf_field_offs. The number of field offsets is
equal to maximum possible fields limited by BTF_FIELDS_MAX. Hence, reuse
BTF_FIELDS_MAX as spin_lock and timer no longer are to be handled
specially for offset sorting, fix the comment, and remove incorrect
WARN_ON as its rec->cnt can never exceed this value. The reason to keep
separate constant was the it was always more 2 more than total kptrs.
This is no longer the case.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Mon, 14 Nov 2022 19:15:22 +0000 (00:45 +0530)]
bpf: Remove local kptr references in documentation
We don't want to commit to a specific name for these. Simply call them
allocated objects coming from bpf_obj_new, which is completely clear in
itself.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Mon, 14 Nov 2022 19:38:25 +0000 (11:38 -0800)]
Merge branch 'libbpf: Fixed various checkpatch issues'
Kang Minchul says:
====================
This patch series contains various checkpatch fixes
in btf.c, libbpf.c, ringbuf.c.
I know these are trivial but some issues are hard to ignore
and I think these checkpatch issues are accumulating.
v1 -> v2: changed cover letter message.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Kang Minchul [Sun, 13 Nov 2022 19:06:48 +0000 (04:06 +0900)]
libbpf: checkpatch: Fixed code alignments in ringbuf.c
Fixed some checkpatch issues in ringbuf.c
Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113190648.38556-4-tegongkang@gmail.com
Kang Minchul [Sun, 13 Nov 2022 19:06:47 +0000 (04:06 +0900)]
libbpf: Fixed various checkpatch issues in libbpf.c
Fixed following checkpatch issues:
WARNING: Block comments use a trailing */ on a separate line
+ * other BPF program's BTF object */
WARNING: Possible repeated word: 'be'
+ * name. This is important to be be able to find corresponding BTF
ERROR: switch and case should be at the same indent
+ switch (ext->kcfg.sz) {
+ case 1: *(__u8 *)ext_val = value; break;
+ case 2: *(__u16 *)ext_val = value; break;
+ case 4: *(__u32 *)ext_val = value; break;
+ case 8: *(__u64 *)ext_val = value; break;
+ default:
ERROR: trailing statements should be on next line
+ case 1: *(__u8 *)ext_val = value; break;
ERROR: trailing statements should be on next line
+ case 2: *(__u16 *)ext_val = value; break;
ERROR: trailing statements should be on next line
+ case 4: *(__u32 *)ext_val = value; break;
ERROR: trailing statements should be on next line
+ case 8: *(__u64 *)ext_val = value; break;
ERROR: code indent should use tabs where possible
+ }$
WARNING: please, no spaces at the start of a line
+ }$
WARNING: Block comments use a trailing */ on a separate line
+ * for faster search */
ERROR: code indent should use tabs where possible
+^I^I^I^I^I^I &ext->kcfg.is_signed);$
WARNING: braces {} are not necessary for single statement blocks
+ if (err) {
+ return err;
+ }
ERROR: code indent should use tabs where possible
+^I^I^I^I sizeof(*obj->btf_modules), obj->btf_module_cnt + 1);$
Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113190648.38556-3-tegongkang@gmail.com
Kang Minchul [Sun, 13 Nov 2022 19:06:46 +0000 (04:06 +0900)]
libbpf: checkpatch: Fixed code alignments in btf.c
Fixed some checkpatch issues in btf.c
Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113190648.38556-2-tegongkang@gmail.com
Maryam Tahhan [Sun, 13 Nov 2022 10:33:27 +0000 (05:33 -0500)]
bpf, docs: Fixup cpumap sphinx >= 3.1 warning
Fixup bpf_map_update_elem() declaration to use a single line.
Reported-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Akira Yokosawa <akiyks@gmail.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113103327.3287482-1-mtahhan@redhat.com
David Michael [Sun, 13 Nov 2022 20:52:17 +0000 (15:52 -0500)]
libbpf: Fix uninitialized warning in btf_dump_dump_type_data
GCC 11.3.0 fails to compile btf_dump.c due to the following error,
which seems to originate in btf_dump_struct_data where the returned
value would be uninitialized if btf_vlen returns zero.
btf_dump.c: In function ‘btf_dump_dump_type_data’:
btf_dump.c:2363:12: error: ‘err’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
2363 | if (err < 0)
| ^
Fixes:
920d16af9b42 ("libbpf: BTF dumper support for typed data")
Signed-off-by: David Michael <fedora.dm0@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/87zgcu60hq.fsf@gmail.com
David S. Miller [Mon, 14 Nov 2022 11:35:28 +0000 (11:35 +0000)]
Merge tag 'mlx5-updates-2022-11-12' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-11-12
Misc updates to mlx5 driver
1) Support enhanced CQE compression, on ConnectX6-Dx
Reduce irq rate, cpu utilization and latency.
2) Connection tracking: Optimize the pre_ct table lookup for rules
installed on chain 0.
3) implement ethtool get_link_ext_stats for PHY down events
4) Expose device vhca_id to debugfs
5) misc cleanups and trivial changes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shenwei Wang [Fri, 11 Nov 2022 15:35:05 +0000 (09:35 -0600)]
net: fec: add xdp and page pool statistics
Added xdp and page pool statistics.
In order to make the implementation simple and compatible, the patch
uses the 32bit integer to record the XDP statistics.
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Nov 2022 11:24:17 +0000 (11:24 +0000)]
Merge branch 'sparx5-sorted-VCAP-rules'
Steen Hegelund says:
====================
net: Add support for sorted VCAP rules in Sparx5
This provides support for adding Sparx5 VCAP rules in sorted order, VCAP
rule counters and TC filter matching on ARP frames.
It builds on top of the initial IS2 VCAP support found in these series:
https://lore.kernel.org/all/
20221020130904.1215072-1-steen.hegelund@microchip.com/
https://lore.kernel.org/all/
20221109114116.3612477-1-steen.hegelund@microchip.com/
Functionality
=============
When a new VCAP rule is added the driver will now ensure that the rule is
inserted in sorted order, and when a rule is removed, the remaining rules
will be moved to keep the sorted order and remove any gaps in the VCAP
address space.
A VCAP rule is ordered using these 3 values:
- Rule size: the count of VCAP addresses used by the rule. The largest
rule have highest priority
- Rule User: The rules are ordered by the user enumeration
- Priority: The priority provided in the flower filter. The lowest value
has the highest priority.
A VCAP instance may contain the counter as part of the VCAP cache area, and
this counter may be one or more bits in width. This type of counter
automatically increments its value when the rule is hit.
Other VCAP instances have a dedicated counter area outside of the VCAP and
in this case the rule must contain the counter id to be able to locate the
counter value and cause the counter to be incremented. In this case there
must also be a VCAP rule action that sets the counter id.
The Sparx5 IS2 VCAP uses a dedicated counter area with 32bit counters.
This series adds support for getting VCAP rule counters and provide these
via the TC statistic interface.
This only support packet counters, not byte counters.
Finally the series adds support for the ARP frame dissector and configures
the Sparx5 IS2 VCAP to generate the ARP keyset when ARP traffic is
received.
Delivery:
=========
This is current plan for delivering the full VCAP feature set of Sparx5:
- DebugFS support for inspecting rules
- TC protocol all support
- Sparx5 IS0 VCAP support
- TC policer and drop action support (depends on the Sparx5 QoS support
upstreamed separately)
- Sparx5 ES0 VCAP support
- TC flower template support
- TC matchall filter support for mirroring and policing ports
- TC flower filter mirror action support
- Sparx5 ES2 VCAP support
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Fri, 11 Nov 2022 13:05:19 +0000 (14:05 +0100)]
net: microchip: sparx5: Add KUNIT test of counters and sorted rules
This tests the insert, move and deleting of rules and checks that the
unused VCAP addresses are initialized correctly.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Fri, 11 Nov 2022 13:05:18 +0000 (14:05 +0100)]
net: microchip: sparx5: Add support for TC flower filter statistics
This provides flower filter packet statistics (bytes are not supported) via
the dedicated IS2 counter feature.
All rules having the same TC cookie will contribute to the packet
statistics for the filter as they are considered to be part of the same TC
flower filter.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Fri, 11 Nov 2022 13:05:17 +0000 (14:05 +0100)]
net: microchip: sparx5: Add support for IS2 VCAP rule counters
This adds API methods to set and get a rule counter.
A VCAP instance may contain the counter as part of the VCAP cache area, and
this counter may be one or more bits in width. This type of counter
automatically increments it value when the rule is hit.
Other VCAP instances have a dedicated counter area outside of the VCAP and
in this case the rule must contain the counter id to be able to locate the
counter value. In this case there must also be a rule action that updates
the counter using the rule id when the rule is hit.
The Sparx5 IS2 VCAP uses a dedicated counter area.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Fri, 11 Nov 2022 13:05:16 +0000 (14:05 +0100)]
net: microchip: sparx5: Add/delete rules in sorted order
This adds a sorting criteria to rule insertion and deletion.
The criteria is (in the listed order):
- Rule size (largest size first)
- User (based on an enumerated user value)
- Priority (highest priority first, aka lowest value)
When a rule is deleted the other rules may need to be moved to fill the gap
to use the available VCAP address space in the best possible way.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Fri, 11 Nov 2022 13:05:15 +0000 (14:05 +0100)]
net: microchip: sparx5: Add support for TC flower ARP dissector
This add support for Sparx5 for dissecting TC ARP flower filter keys and
sets up the Sparx5 IS2 VCAP to generate the ARP keyset for ARP frames.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Fri, 11 Nov 2022 13:05:14 +0000 (14:05 +0100)]
net: flow_offload: add support for ARP frame matching
This adds a new flow_rule_match_arp function that allows drivers
to be able to dissect ARP frames.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xu xin [Fri, 11 Nov 2022 09:04:20 +0000 (09:04 +0000)]
ipasdv4/tcp_ipv4: remove redundant assignment
The value of 'st->state' has been verified as "TCP_SEQ_STATE_LISTENING",
it's unnecessary to assign TCP_SEQ_STATE_LISTENING to it, so we can remove it.
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Nov 2022 10:47:07 +0000 (10:47 +0000)]
Merge branch 'ibmvnic-affinity-hints'
Nick Child says:
====================
ibmvnic: Introduce affinity hint support
This is a patchset to do 3 things to improve ibmvnic performance:
1. Assign affinity hints to ibmvnic queue irq's
2. Update affinity hints on cpu hotplug events
3. Introduce transmit packet steering (XPS)
NOTE: If irqbalance is running, you need to stop it from overriding
our affinity hints. To do this you can do one of:
- systemctl stop irqbalance
- ban the ibmvnic module irqs
- you must have the latest irqbalance v9.2, the banmod argument was broken before this
- in /etc/sysconfig/irqbalance -> IRQBALANCE_ARGS="--banmod=ibmvnic"
- systemctl restart irqbalance
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Child [Thu, 10 Nov 2022 21:32:18 +0000 (15:32 -0600)]
ibmvnic: Update XPS assignments during affinity binding
Transmit Packet Steering (XPS) maps cpu numbers to transmit
queues. By running the same connection on the same set of cpu's,
contention for the queue and cache miss rate can be minimized.
When assigning a cpu mask for a tranmit queues irq number, assign
the same cpu mask as the set of cpu's that XPS should use for that
queue.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Child [Thu, 10 Nov 2022 21:32:17 +0000 (15:32 -0600)]
ibmvnic: Add hotpluggable CPU callbacks to reassign affinity hints
When CPU's are added and removed, ibmvnic devices will reassign
hint values. Introduce a new cpu hotplug state CPUHP_IBMVNIC_DEAD
to signal to ibmvnic devices that the CPU has been removed and it
is time to reset affinity hint assignments. On the other hand,
when CPU's are being added, add a state instance to
CPUHP_AP_ONLINE_DYN which will trigger a reassignment of affinity
hints once the new CPU's are online. This implementation is based
on the virtio_net driver.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Child [Thu, 10 Nov 2022 21:32:16 +0000 (15:32 -0600)]
ibmvnic: Assign IRQ affinity hints to device queues
Assign affinity hints to ibmvnic device queue interrupts.
Affinity hints are assigned and removed during sub-crq init and
teardown, respectively. This update should improve latency if
utilized as interrupt lines and processing are more equally
distributed among CPU's. This implementation is based on the
virtio_net driver.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Md Fahad Iqbal Polash [Thu, 10 Nov 2022 13:03:53 +0000 (14:03 +0100)]
ice: virtchnl rss hena support
Add support for 2 virtchnl msgs:
VIRTCHNL_OP_SET_RSS_HENA
VIRTCHNL_OP_GET_RSS_HENA_CAPS
The first one allows VFs to clear all previously programmed
RSS configuration and customize it. The second one returns
the RSS HENA bits allowed by the hardware.
Introduce ice_err_to_virt_err which converts kernel
specific errors to virtchnl errors.
Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuang Wang [Thu, 10 Nov 2022 07:31:25 +0000 (15:31 +0800)]
net: tun: rebuild error handling in tun_get_user
The error handling in tun_get_user is very scattered.
This patch unifies error handling, reduces duplication of code, and
makes the logic clearer.
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Wed, 26 Oct 2022 10:26:31 +0000 (11:26 +0100)]
net/mlx5e: ethtool: get_link_ext_stats for PHY down events
Implement ethtool_op get_link_ext_stats for PHY down events
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Oz Shlomo [Mon, 31 Oct 2022 09:00:30 +0000 (09:00 +0000)]
net/mlx5e: CT, optimize pre_ct table lookup
The pre_ct table realizes in hardware the act_ct cache logic, bypassing
the CT table if the ct state was already set by a previous ct lookup.
As such, the pre_ct table will always miss for chain 0 filters.
Optimize the pre_ct table lookup for rules installed on chain 0.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tariq Toukan [Wed, 14 Sep 2022 07:35:04 +0000 (10:35 +0300)]
net/mlx5e: kTLS, Use a single async context object per a callback bulk
A single async context object is sufficient to wait for the completions
of many callbacks. Switch to using one instance per a bulk of commands.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tariq Toukan [Mon, 12 Sep 2022 18:43:18 +0000 (21:43 +0300)]
net/mlx5e: kTLS, Remove unnecessary per-callback completion
Waiting on a completion object for each callback before cleaning up their
async contexts is not necessary, as this is already implied in the
mlx5_cmd_cleanup_async_ctx() API.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tariq Toukan [Wed, 14 Sep 2022 08:02:34 +0000 (11:02 +0300)]
net/mlx5e: kTLS, Remove unused work field
Work field in struct mlx5e_async_ctx is not used. Remove it.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Wed, 21 Sep 2022 06:17:15 +0000 (09:17 +0300)]
net/mlx5e: TC, Remove redundant WARN_ON()
The case where the packet is not offloaded and needs to be restored
to slow path and couldn't find expected tunnel information should not
dump a call trace to the user. there is a debug call.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Guy Truzman [Thu, 29 Sep 2022 12:12:51 +0000 (15:12 +0300)]
net/mlx5e: Add error flow when failing update_rx
Up until now, return value of update_rx was ignored. Therefore, flow
continues even if it fails. Add error flow in case of update_rx fails in
mlx5e_open_locked, mlx5i_open and mlx5i_pkey_open.
Signed-off-by: Guy Truzman <gtruzman@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tariq Toukan [Wed, 18 May 2022 08:46:35 +0000 (11:46 +0300)]
net/mlx5e: Move params kernel log print to probe function
Params info print was meant to be printed on load.
With time, new calls to mlx5e_init_rq_type_params and
mlx5e_build_rq_params were added, mistakenly printing
the params once again.
Move the print to were it belongs, in mlx5e_probe.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Ofer Levi [Tue, 9 Feb 2021 15:48:11 +0000 (17:48 +0200)]
net/mlx5e: Support enhanced CQE compression
CQE compression feature improves performance by reducing PCI bandwidth
bottleneck on CQEs write.
Enhanced CQE compression introduced in ConnectX-6 and it aims to reduce
CPU utilization of SW side packets decompression by eliminating the
need to rewrite ownership bit, which is likely to cost a cache-miss, is
replaced by validity byte handled solely by HW.
Another advantage of the enhanced feature is that session packets are
available to SW as soon as a single CQE slot is filled, instead of
waiting for session to close, this improves packet latency from NIC to
host.
Performance:
Following are tested scenarios and reults comparing basic and enahnced
CQE compression.
setup: IXIA 100GbE connected directly to port 0 and port 1 of
ConnectX-6 Dx 100GbE dual port.
Case #1 RX only, single flow goes to single queue:
IRQ rate reduced by ~ 30%, CPU utilization improved by 2%.
Case #2 IP forwarding from port 1 to port 0 single flow goes to
single queue:
Avg latency improved from 60us to 21us, frame loss improved from 0.5% to 0.0%.
Case #3 IP forwarding from port 1 to port 0 Max Throughput IXIA sends
100%, 8192 UDP flows, goes to 24 queues:
Enhanced is equal or slightly better than basic.
Testing the basic compression feature with this patch shows there is
no perfrormance degradation of the basic compression feature.
Signed-off-by: Ofer Levi <oferle@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Gal Pressman [Sun, 4 Sep 2022 10:29:26 +0000 (13:29 +0300)]
net/mlx5e: Use clamp operation instead of open coding it
Replace the min/max operations with a single clamp.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Anisse Astier [Mon, 31 Oct 2022 16:56:04 +0000 (17:56 +0100)]
net/mlx5e: remove unused list in arfs
This is never used, and probably something that was intended to be used
before per-protocol hash tables were chosen instead.
Signed-off-by: Anisse Astier <anisse@astier.eu>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Eli Cohen [Wed, 21 Sep 2022 10:33:29 +0000 (13:33 +0300)]
net/mlx5: Expose vhca_id to debugfs
hca_id is an identifier of an mlx5_core instance within the hardware.
This identifier may be required for troubleshooting.
Expose it to debugfs.
Example:
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.2/vhca_id
0x12
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Moshe Shemesh [Mon, 8 Aug 2022 17:02:59 +0000 (20:02 +0300)]
net/mlx5: Unregister traps on driver unload flow
Before this patch, devlink traps are registered only on full driver
probe and unregistered on driver removal. As devlink traps are not
usable once driver functionality is unloaded, it should be unrgeistered
also on flows that unload the driver and then registered when loaded
back, e.g. devlink reload flow.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Colin Ian King [Mon, 31 Oct 2022 08:01:04 +0000 (08:01 +0000)]
net/mlx5: Fix spelling mistake "destoy" -> "destroy"
There is a spelling mistake in an error message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Thu, 27 Oct 2022 08:35:12 +0000 (11:35 +0300)]
net/mlx5: Bridge, Use debug instead of warn if entry doesn't exists
There is no need for the warn if entry already removed.
Use debug print like in the update flow.
Also update the messages so user can identify if the it's
from the update flow or remove flow.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Eric Dumazet [Thu, 10 Nov 2022 19:02:39 +0000 (19:02 +0000)]
tcp: tcp_wfree() refactoring
Use try_cmpxchg() (instead of cmpxchg()) in a more readable way.
oval = smp_load_acquire(&sk->sk_tsq_flags);
do {
...
} while (!try_cmpxchg(&sk->sk_tsq_flags, &oval, nval));
Reduce indentation level.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221110190239.3531280-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 10 Nov 2022 17:48:29 +0000 (17:48 +0000)]
tcp: adopt try_cmpxchg() in tcp_release_cb()
try_cmpxchg() is slighly more efficient (at least on x86),
and smp_load_acquire(&sk->sk_tsq_flags) could avoid a KCSAN report.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221110174829.3403442-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Thu, 10 Nov 2022 08:54:22 +0000 (10:54 +0200)]
bridge: Add missing parentheses
No changes in generated code.
Reported-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20221110085422.521059-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 12 Nov 2022 05:23:23 +0000 (21:23 -0800)]
Merge branch 'dt-bindings-net-qcom-ipa-relax-some-restrictions'
Alex Elder says:
====================
dt-bindings: net: qcom,ipa: relax some restrictions
The first patch in this series simply removes an unnecessary
requirement in the IPA binding. Previously, if the modem was doing
GSI firmware loading, the firmware name property was required to
*not* be present. There is no harm in having the firmware name be
specified, so this restriction isn't needed.
The second patch restates a requirement on the "memory-region"
property more accurately.
These binding changes have no impact on existing code or DTS files.
These aren't really bug fixes, so no need to back-port.
====================
Link: https://lore.kernel.org/r/20221110195619.1276302-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Thu, 10 Nov 2022 19:56:18 +0000 (13:56 -0600)]
dt-bindings: net: qcom,ipa: restate a requirement
Either the AP or modem loads GSI firmware. If the modem-init
property is present, the modem loads it. Otherwise, the AP loads
it, and in that case the memory-region property must be defined.
Currently this requirement is expressed as one or the other of the
modem-init or the memory-region property being required. But it's
harmless for the memory-region to be present if the modem is loading
firmware (it'll just be ignored).
Restate the requirement so that the memory-region property is
required only if modem-init is not present.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Thu, 10 Nov 2022 19:56:17 +0000 (13:56 -0600)]
dt-bindings: net: qcom,ipa: remove an unnecessary restriction
Commit
d8604b209e9b3 ("dt-bindings: net: qcom,ipa: add firmware-name
property") added a requirement for a "firmware-name" property that
is more restrictive than necessary.
If the AP loads GSI firmware, the name of the firmware file to use
may optionally be provided via a "firmware-name" property. If the
*modem* loads GSI firmware, "firmware-name" doesn't need to be
supplied--but it's harmless to do so (it will simply be ignored).
Remove the unnecessary restriction, and allow "firware-name" to be
supplied even if it's not needed.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Angelo Dureghello [Thu, 10 Nov 2022 09:10:27 +0000 (10:10 +0100)]
net: dsa: mv88e6xxx: enable set_policy
Enabling set_policy capability for mv88e6321.
Signed-off-by: Angelo Dureghello <angelo.dureghello@timesys.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221110091027.998073-1-angelo.dureghello@timesys.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 12 Nov 2022 05:19:50 +0000 (21:19 -0800)]
Merge branch 'mptcp-miscellaneous-refactoring-and-small-fixes'
Mat Martineau says:
====================
mptcp: Miscellaneous refactoring and small fixes
Patches 1-3 do some refactoring to more consistently handle sock casts,
and to remove some duplicate code. No functional changes.
Patch 4 corrects a variable name in a self test, but does not change
functionality since the same value gets used due to bash's
scoping rules.
Patch 5 rewords a comment.
====================
Link: https://lore.kernel.org/r/20221110232322.125068-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau [Thu, 10 Nov 2022 23:23:22 +0000 (15:23 -0800)]
mptcp: Fix grammar in a comment
We kept getting initial patches from new contributors to remove a
duplicate 'the' (since grammar checking scripts flag it), but submitters
never followed up after code review.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Thu, 10 Nov 2022 23:23:21 +0000 (15:23 -0800)]
selftests: mptcp: use max_time instead of time
'time' is the local variable of run_test() function, while 'max_time' is
the local variable of do_transfer() function. So in do_transfer(),
$max_time should be used, not $time.
Please note that here $time == $max_time so the behaviour is not changed
but the right variable is used.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Thu, 10 Nov 2022 23:23:20 +0000 (15:23 -0800)]
mptcp: get sk from msk directly
Use '(struct sock *)msk' to get 'sk' from 'msk' in a more direct way
instead of using '&msk->sk.icsk_inet.sk'.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Thu, 10 Nov 2022 23:23:19 +0000 (15:23 -0800)]
mptcp: change 'first' as a parameter
The function mptcp_subflow_process_delegated() uses the input ssk first,
while __mptcp_check_push() invokes the packet scheduler first.
So this patch adds a new parameter named 'first' for the function
__mptcp_subflow_push_pending() to deal with these two cases separately.
With this change, the code that invokes the packet scheduler in the
function __mptcp_check_push() can be removed, and replaced by invoking
__mptcp_subflow_push_pending() directly.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Thu, 10 Nov 2022 23:23:18 +0000 (15:23 -0800)]
mptcp: use msk instead of mptcp_sk
Use msk instead of mptcp_sk(sk) in the functions where the variable
"msk = mptcp_sk(sk)" has been defined.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 12 Nov 2022 02:33:02 +0000 (18:33 -0800)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:
====================
bpf-next 2022-11-11
We've added 49 non-merge commits during the last 9 day(s) which contain
a total of 68 files changed, 3592 insertions(+), 1371 deletions(-).
The main changes are:
1) Veristat tool improvements to support custom filtering, sorting, and replay
of results, from Andrii Nakryiko.
2) BPF verifier precision tracking fixes and improvements,
from Andrii Nakryiko.
3) Lots of new BPF documentation for various BPF maps, from Dave Tucker,
Donald Hunter, Maryam Tahhan, Bagas Sanjaya.
4) BTF dedup improvements and libbpf's hashmap interface clean ups, from
Eduard Zingerman.
5) Fix veth driver panic if XDP program is attached before veth_open, from
John Fastabend.
6) BPF verifier clean ups and fixes in preparation for follow up features,
from Kumar Kartikeya Dwivedi.
7) Add access to hwtstamp field from BPF sockops programs,
from Martin KaFai Lau.
8) Various fixes for BPF selftests and samples, from Artem Savkov,
Domenico Cerasuolo, Kang Minchul, Rong Tao, Yang Jihong.
9) Fix redirection to tunneling device logic, preventing skb->len == 0, from
Stanislav Fomichev.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits)
selftests/bpf: fix veristat's singular file-or-prog filter
selftests/bpf: Test skops->skb_hwtstamp
selftests/bpf: Fix incorrect ASSERT in the tcp_hdr_options test
bpf: Add hwtstamp field for the sockops prog
selftests/bpf: Fix xdp_synproxy compilation failure in 32-bit arch
bpf, docs: Document BPF_MAP_TYPE_ARRAY
docs/bpf: Document BPF map types QUEUE and STACK
docs/bpf: Document BPF ARRAY_OF_MAPS and HASH_OF_MAPS
docs/bpf: Document BPF_MAP_TYPE_CPUMAP map
docs/bpf: Document BPF_MAP_TYPE_LPM_TRIE map
libbpf: Hashmap.h update to fix build issues using LLVM14
bpf: veth driver panics when xdp prog attached before veth_open
selftests: Fix test group SKIPPED result
selftests/bpf: Tests for btf_dedup_resolve_fwds
libbpf: Resolve unambigous forward declarations
libbpf: Hashmap interface update to allow both long and void* keys/values
samples/bpf: Fix sockex3 error: Missing BPF prog type
selftests/bpf: Fix u32 variable compared with less than zero
Documentation: bpf: Escape underscore in BPF type name prefix
selftests/bpf: Use consistent build-id type for liburandom_read.so
...
====================
Link: https://lore.kernel.org/r/20221111233733.1088228-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 12 Nov 2022 02:18:09 +0000 (18:18 -0800)]
Merge branch 'net-vlan-claim-one-bit-from-sk_buff'
Eric Dumazet says:
====================
net: vlan: claim one bit from sk_buff
First patch claims skb->vlan_present.
This means some bpf changes, eg for sparc32 that I could not test.
Second patch removes one conditional test in gro_list_prepare().
====================
Link: https://lore.kernel.org/r/20221109095759.1874969-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 9 Nov 2022 09:57:59 +0000 (09:57 +0000)]
net: gro: no longer use skb_vlan_tag_present()
We can remove a conditional test in gro_list_prepare()
by comparing vlan_all fields of the two skbs.
Notes:
While comparing the vlan_proto is not strictly needed,
because part of the following compare_ether_header() call,
using 32bit word is actually faster than using 16bit values.
napi_reuse_skb() makes sure to clear skb->vlan_all,
as it already calls __vlan_hwaccel_clear_tag()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 9 Nov 2022 09:57:58 +0000 (09:57 +0000)]
net: remove skb->vlan_present
skb->vlan_present seems redundant.
We can instead derive it from this boolean expression:
vlan_present = skb->vlan_proto != 0 || skb->vlan_tci != 0
Add a new union, to access both fields in a single load/store
when possible.
union {
u32 vlan_all;
struct {
__be16 vlan_proto;
__u16 vlan_tci;
};
};
This allows following patch to remove a conditional test in GRO stack.
Note:
We move remcsum_offload to keep TC_AT_INGRESS_MASK
and SKB_MONO_DELIVERY_TIME_MASK unchanged.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrii Nakryiko [Fri, 11 Nov 2022 18:12:42 +0000 (10:12 -0800)]
selftests/bpf: fix veristat's singular file-or-prog filter
Fix the bug of filtering out filename too early, before we know the
program name, if using unified file-or-prog filter (i.e., -f
<any-glob>). Because we try to filter BPF object file early without
opening and parsing it, if any_glob (file-or-prog) filter is used we
have to accept any filename just to get program name, which might match
any_glob.
Fixes:
10b1b3f3e56a ("selftests/bpf: consolidate and improve file/prog filtering in veristat")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221111181242.2101192-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Andrii Nakryiko [Fri, 11 Nov 2022 21:10:11 +0000 (13:10 -0800)]
Merge branch 'bpf: Add hwtstamp field for the sockops prog'
Martin KaFai Lau says:
====================
From: Martin KaFai Lau <martin.lau@kernel.org>
The bpf-tc prog has already been able to access the
skb_hwtstamps(skb)->hwtstamp. This set extends the same hwtstamp
access to the sockops prog.
v2:
- Fixed the btf_dump selftest which depends on the
last member of 'struct bpf_sock_ops'.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Martin KaFai Lau [Mon, 7 Nov 2022 23:04:20 +0000 (15:04 -0800)]
selftests/bpf: Test skops->skb_hwtstamp
This patch tests reading the skops->skb_hwtstamp field.
A local test was also done such that the shinfo hwtstamp was temporary
set to a non zero value in the kernel bpf_skops_parse_hdr()
and the same value can be read by the skops test.
An adjustment is needed to the btf_dump selftest because
the changes in the 'struct bpf_sock_ops'.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-4-martin.lau@linux.dev
Martin KaFai Lau [Mon, 7 Nov 2022 23:04:19 +0000 (15:04 -0800)]
selftests/bpf: Fix incorrect ASSERT in the tcp_hdr_options test
This patch fixes the incorrect ASSERT test in tcp_hdr_options during
the CHECK to ASSERT macro cleanup.
Fixes:
3082f8cd4ba3 ("selftests/bpf: Convert tcp_hdr_options test to ASSERT_* macros")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-3-martin.lau@linux.dev
Martin KaFai Lau [Mon, 7 Nov 2022 23:04:18 +0000 (15:04 -0800)]
bpf: Add hwtstamp field for the sockops prog
The bpf-tc prog has already been able to access the
skb_hwtstamps(skb)->hwtstamp. This patch extends the same hwtstamp
access to the sockops prog.
In sockops, the skb is also available to the bpf prog during
the BPF_SOCK_OPS_PARSE_HDR_OPT_CB event. There is a use case
that the hwtstamp will be useful to the sockops prog to better
measure the one-way-delay when the sender has put the tx
timestamp in the tcp header option.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-2-martin.lau@linux.dev
Yang Jihong [Fri, 11 Nov 2022 03:08:36 +0000 (11:08 +0800)]
selftests/bpf: Fix xdp_synproxy compilation failure in 32-bit arch
xdp_synproxy fails to be compiled in the 32-bit arch, log is as follows:
xdp_synproxy.c: In function 'parse_options':
xdp_synproxy.c:175:36: error: left shift count >= width of type [-Werror=shift-count-overflow]
175 | *tcpipopts = (mss6 << 32) | (ttl << 24) | (wscale << 16) | mss4;
| ^~
xdp_synproxy.c: In function 'syncookie_open_bpf_maps':
xdp_synproxy.c:289:28: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
289 | .map_ids = (__u64)map_ids,
| ^
Fix it.
Fixes:
fb5cd0ce70d4 ("selftests/bpf: Add selftests for raw syncookie helpers")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221111030836.37632-1-yangjihong1@huawei.com
Dave Tucker [Wed, 9 Nov 2022 17:46:04 +0000 (17:46 +0000)]
bpf, docs: Document BPF_MAP_TYPE_ARRAY
Add documentation for the BPF_MAP_TYPE_ARRAY including kernel version
introduced, usage and examples. Also document BPF_MAP_TYPE_PERCPU_ARRAY
which is similar.
Co-developed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Maryam Tahhan <mtahhan@redhat.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/bpf/20221109174604.31673-2-donald.hunter@gmail.com
Donald Hunter [Tue, 8 Nov 2022 09:33:14 +0000 (09:33 +0000)]
docs/bpf: Document BPF map types QUEUE and STACK
Add documentation for BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK,
including usage and examples.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221108093314.44851-1-donald.hunter@gmail.com
Donald Hunter [Tue, 8 Nov 2022 10:22:15 +0000 (10:22 +0000)]
docs/bpf: Document BPF ARRAY_OF_MAPS and HASH_OF_MAPS
Add documentation for the ARRAY_OF_MAPS and HASH_OF_MAPS map types,
including usage and examples.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221108102215.47297-1-donald.hunter@gmail.com
Maryam Tahhan [Mon, 7 Nov 2022 16:52:07 +0000 (11:52 -0500)]
docs/bpf: Document BPF_MAP_TYPE_CPUMAP map
Add documentation for BPF_MAP_TYPE_CPUMAP including
kernel version introduced, usage and examples.
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107165207.2682075-2-mtahhan@redhat.com
Donald Hunter [Tue, 1 Nov 2022 11:45:42 +0000 (11:45 +0000)]
docs/bpf: Document BPF_MAP_TYPE_LPM_TRIE map
Add documentation for BPF_MAP_TYPE_LPM_TRIE including kernel
BPF helper usage, userspace usage and examples.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221101114542.24481-2-donald.hunter@gmail.com
Eduard Zingerman [Thu, 10 Nov 2022 22:32:40 +0000 (00:32 +0200)]
libbpf: Hashmap.h update to fix build issues using LLVM14
A fix for the LLVM compilation error while building bpftool.
Replaces the expression:
_Static_assert((p) == NULL || ...)
by expression:
_Static_assert((__builtin_constant_p((p)) ? (p) == NULL : 0) || ...)
When "p" is not a constant the former is not considered to be a
constant expression by LLVM 14.
The error was introduced in the following patch-set: [1].
The error was reported here: [2].
[1] https://lore.kernel.org/bpf/
20221109142611.879983-1-eddyz87@gmail.com/
[2] https://lore.kernel.org/all/
202211110355.BcGcbZxP-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Fixes:
c302378bc157 ("libbpf: Hashmap interface update to allow both long and void* keys/values")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221110223240.1350810-1-eddyz87@gmail.com
David S. Miller [Fri, 11 Nov 2022 10:58:39 +0000 (10:58 +0000)]
Merge branch 'ptp-adjfreq-copnvert'
Jacob Keller says:
====================
ptp: convert remaining users of .adjfreq
A handful of drivers remain which still use the .adjfreq interface instead
of the newer .adjfine interface. The new interface is preferred as it has a
more precise adjustment using scaled parts per million.
A handful of the remaining drivers are implemented with a common pattern
that can be refactored to use the adjust_by_scaled_ppm and
diff_by_scaled_ppm helper functions. These include the ptp_phc, ptp_ixp64x,
tg3, hclge, stmac, cpts and bnxt drivers. These are each refactored in a
separate change.
The remaining drivers, bnx2x, liquidio, cxgb4, fec, and qede implement
.adjfreq in a way different from the normal pattern expected by
adjust_by_scaled_ppm. Fixing these drivers to properly use .adjfine requires
specific knowledge of the hardware implementation. Instead I simply refactor
them to use .adjfine and convert scaled_ppm into ppb using the
scaled_ppm_to_ppb function.
Finally, the .adjfreq implementation interface is removed entirely. This
simplifies the interface and ensures that new drivers must implement the new
interface as they no longer have an alternative.
This still leaves parts per billion used as part of the max_adj interface,
and the core PTP stack still converts scaled_ppm to ppb to check this. I
plan to investigate fixing this in the future.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 9 Nov 2022 23:09:45 +0000 (15:09 -0800)]
ptp: remove the .adjfreq interface function
Now that all drivers have been converted to .adjfine, we can remove the
.adjfreq from the interface structure.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 9 Nov 2022 23:09:44 +0000 (15:09 -0800)]
ptp: convert remaining drivers to adjfine interface
Convert all remaining drivers that still use .adjfreq to the newer .adjfine
implementation. These drivers are not straightforward, as they use
non-standard methods of programming their hardware. They are all converted
to use scaled_ppm_to_ppb to get the parts per billion value that their
logic depends on.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Ariel Elior <aelior@marvell.com>
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Cc: Derek Chickles <dchickles@marvell.com>
Cc: Satanand Burla <sburla@marvell.com>
Cc: Felix Manlunas <fmanlunas@marvell.com>
Cc: Raju Rangoju <rajur@chelsio.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Edward Cree <ecree.xilinx@gmail.com>
Cc: Martin Habets <habetsm.xilinx@gmail.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 9 Nov 2022 23:09:43 +0000 (15:09 -0800)]
ptp: bnxt: convert .adjfreq to .adjfine
When the BNXT_FW_CAP_PTP_RTC flag is not set, the bnxt driver implements
.adjfreq on a cyclecounter in terms of the straightforward "base * ppb / 1
billion" calculation. When BNXT_FW_CAP_PTP_RTC is set, the driver forwards
the ppb value to firmware for configuration.
Convert the driver to the newer .adjfine interface, updating the
cyclecounter calculation to use adjust_by_scaled_ppm to perform the
calculation. Use scaled_ppm_to_ppb when forwarding the correction to
firmware.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 9 Nov 2022 23:09:42 +0000 (15:09 -0800)]
ptp: cpts: convert .adjfreq to .adjfine
The cpts implementation of .adjfreq is implemented in terms of a
straight forward "base * ppb / 1 billion" calculation.
Convert this to the newer .adjfine, using the recently added
adjust_by_scaled_ppm helper function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 9 Nov 2022 23:09:41 +0000 (15:09 -0800)]
ptp: stmac: convert .adjfreq to .adjfine
The stmac implementation of .adjfreq is implemented in terms of a
straight forward "base * ppb / 1 billion" calculation.
Convert this to the newer .adjfine, using the recently added
adjust_by_scaled_ppm helper function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 9 Nov 2022 23:09:40 +0000 (15:09 -0800)]
ptp: hclge: convert .adjfreq to .adjfine
The hclge implementation of .adjfreq is implemented in terms of a
straight forward "base * ppb / 1 billion" calculation.
Convert this to the newer .adjfine, using the recently added
adjust_by_scaled_ppm helper function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 9 Nov 2022 23:09:39 +0000 (15:09 -0800)]
ptp: tg3: convert .adjfreq to .adjfine
The tg3 implementation of .adjfreq is implemented in terms of a
straight forward "base * ppb / 1 billion" calculation.
Convert this to the newer .adjfine, using the recently added
diff_by_scaled_ppm helper function to calculate the difference and
direction of the adjustment.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Siva Reddy Kallam <siva.kallam@broadcom.com>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 9 Nov 2022 23:09:38 +0000 (15:09 -0800)]
ptp_ixp46x: convert .adjfreq to .adjfine
The ptp_ixp46x implementation of .adjfreq is implemented in terms of a
straight forward "base * ppb / 1 billion" calculation.
Convert this to the newer .adjfine, using the recently added
adjust_by_scaled_ppm helper function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 9 Nov 2022 23:09:37 +0000 (15:09 -0800)]
ptp_phc: convert .adjfreq to .adjfine
The ptp_phc implementation of .adjfreq is implemented in terms of a
straight forward "base * ppb / 1 billion" calculation.
Convert this to the newer .adjfine, updating the driver to use the recently
introduced adjust_by_scaled_ppm helper function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 Nov 2022 10:52:55 +0000 (10:52 +0000)]
Merge branch 'marvell-prestera-AC5X-support'
Oleksandr Mazur says:
====================
net: marvell: prestera: pci: add support for AC5X family devices
This patch series introduces a support for AC5X family devices.
AC5X devices utilize arm64 CPUs, and thus require a new FW (arm64-one)
to be loaded. The new FW-image for AC5X devices has been introduces in
the linux-firmware repo under the following commit:
60310c2deb8c ("Merge branch 'prestera-v4.1' of
https://github.com/PLVision/linux-firmware")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksandr Mazur [Wed, 9 Nov 2022 22:25:22 +0000 (00:25 +0200)]
net: marvell: prestera: pci: bump supported FW min version
Bump MIN version to reflect support of new platform (AC5X family devices).
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maksym Glubokiy [Wed, 9 Nov 2022 22:25:21 +0000 (00:25 +0200)]
net: marvell: prestera: pci: add support for AC5X family devices
Add support for the following AC5x Marvell Prestera PP family devices:
98DX7312M (12x25G / 8x25G + 1x100G);
98DX3500 (24x1G + 6x25G);
98DX3501 (16x1G + 6x10G);
98DX3510 (48x1G + 6x25G);
98DX3520 (24x2.5G + 6x25G);
Known issues:
- FW reload doesn't work (rmmod/modprobe sequence).
Co-developed-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksandr Mazur [Wed, 9 Nov 2022 22:25:20 +0000 (00:25 +0200)]
net: marvell: prestera: pci: use device-id defines
Use defines with proper device names instead of device-id in pci-devices
listing.
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 Nov 2022 10:49:34 +0000 (10:49 +0000)]
Merge branch 'lan966x-xdp'
Horatiu Vultur says:
====================
net: lan966x: Add xdp support
Add support for xdp in lan966x driver. Currently only XDP_PASS and
XDP_DROP are supported.
The first 2 patches are just moving things around just to simplify
the code for when the xdp is added.
Patch 3 actually adds the xdp. Currently the only supported actions
are XDP_PASS and XDP_DROP. In the future this will be extended with
XDP_TX and XDP_REDIRECT.
Patch 4 changes to use page pool API, because the handling of the
pages is similar with what already lan966x driver is doing. In this
way is possible to remove some of the code.
All these changes give a small improvement on the RX side:
Before:
iperf3 -c 10.96.10.1 -R
[ 5] 0.00-10.01 sec 514 MBytes 430 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 509 MBytes 427 Mbits/sec receiver
After:
iperf3 -c 10.96.10.1 -R
[ 5] 0.00-10.02 sec 540 MBytes 452 Mbits/sec 0 sender
[ 5] 0.00-10.01 sec 537 MBytes 450 Mbits/sec receiver
---
v2->v3:
- inline lan966x_xdp_port_present
- update max_len of page_pool_params not to be the page size anymore but
actually be rx->max_mtu.
v1->v2:
- rebase on net-next, once the fixes for FDMA and MTU were accepted
- drop patch 2, which changes the MTU as is not needed anymore
- allow to run xdp programs on frames bigger than 4KB
====================
Signed-off-by: David S. Miller <davem@davemloft.net>