Alexei Starovoitov [Tue, 27 Apr 2021 22:50:15 +0000 (15:50 -0700)]
Merge branch 'Implement formatted output helpers with bstr_printf'
Florent Revest says:
====================
BPF's formatted output helpers are currently implemented with
snprintf-like functions which use variadic arguments. The types of all
arguments need to be known at compilation time. BPF_CAST_FMT_ARG casts
all arguments to the size they should be (known at runtime), but the C
type promotion rules cast them back to u64s. On 32 bit architectures,
this can cause misaligned va_lists and generate mangled output.
This series refactors these helpers to avoid variadic arguments. It uses
a "binary printf" instead, where arguments are passed in a buffer
constructed at runtime.
---
Changes in v2:
- Reworded the second patch's description to better describe how
arguments get mangled on 32 bit architectures
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Florent Revest [Tue, 27 Apr 2021 17:43:13 +0000 (19:43 +0200)]
bpf: Implement formatted output helpers with bstr_printf
BPF has three formatted output helpers: bpf_trace_printk, bpf_seq_printf
and bpf_snprintf. Their signatures specify that all arguments are
provided from the BPF world as u64s (in an array or as registers). All
of these helpers are currently implemented by calling functions such as
snprintf() whose signatures take a variable number of arguments, then
placed in a va_list by the compiler to call vsnprintf().
"
d9c9e4db bpf: Factorize bpf_trace_printk and bpf_seq_printf" introduced
a bpf_printf_prepare function that fills an array of u64 sanitized
arguments with an array of "modifiers" which indicate what the "real"
size of each argument should be (given by the format specifier). The
BPF_CAST_FMT_ARG macro consumes these arrays and casts each argument to
its real size. However, the C promotion rules implicitely cast them all
back to u64s. Therefore, the arguments given to snprintf are u64s and
the va_list constructed by the compiler will use 64 bits for each
argument. On 64 bit machines, this happens to work well because 32 bit
arguments in va_lists need to occupy 64 bits anyway, but on 32 bit
architectures this breaks the layout of the va_list expected by the
called function and mangles values.
In "
88a5c690b6 bpf: fix bpf_trace_printk on 32 bit archs", this problem
had been solved for bpf_trace_printk only with a "horrid workaround"
that emitted multiple calls to trace_printk where each call had
different argument types and generated different va_list layouts. One of
the call would be dynamically chosen at runtime. This was ok with the 3
arguments that bpf_trace_printk takes but bpf_seq_printf and
bpf_snprintf accept up to 12 arguments. Because this approach scales
code exponentially, it is not a viable option anymore.
Because the promotion rules are part of the language and because the
construction of a va_list is an arch-specific ABI, it's best to just
avoid variadic arguments and va_lists altogether. Thankfully the
kernel's snprintf() has an alternative in the form of bstr_printf() that
accepts arguments in a "binary buffer representation". These binary
buffers are currently created by vbin_printf and used in the tracing
subsystem to split the cost of printing into two parts: a fast one that
only dereferences and remembers values, and a slower one, called later,
that does the pretty-printing.
This patch refactors bpf_printf_prepare to construct binary buffers of
arguments consumable by bstr_printf() instead of arrays of arguments and
modifiers. This gets rid of BPF_CAST_FMT_ARG and greatly simplifies the
bpf_printf_prepare usage but there are a few gotchas that change how
bpf_printf_prepare needs to do things.
Currently, bpf_printf_prepare uses a per cpu temporary buffer as a
generic storage for strings and IP addresses. With this refactoring, the
temporary buffers now holds all the arguments in a structured binary
format.
To comply with the format expected by bstr_printf, certain format
specifiers also need to be pre-formatted: %pB and %pi6/%pi4/%pI4/%pI6.
Because vsnprintf subroutines for these specifiers are hard to expose,
we pre-format these arguments with calls to snprintf().
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210427174313.860948-3-revest@chromium.org
Florent Revest [Tue, 27 Apr 2021 17:43:12 +0000 (19:43 +0200)]
seq_file: Add a seq_bprintf function
Similarly to seq_buf_bprintf in lib/seq_buf.c, this function writes a
printf formatted string with arguments provided in a "binary
representation" built by functions such as vbin_printf.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210427174313.860948-2-revest@chromium.org
Hengqi Chen [Sat, 24 Apr 2021 02:12:08 +0000 (10:12 +0800)]
bpf, docs: Fix literal block for example code
Add a missing colon so that the code block followed can be rendered
properly.
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210424021208.832116-1-hengqi.chen@gmail.com
Lorenzo Bianconi [Fri, 23 Apr 2021 09:27:27 +0000 (11:27 +0200)]
bpf, cpumap: Bulk skb using netif_receive_skb_list
Rely on netif_receive_skb_list routine to send skbs converted from
xdp_frames in cpu_map_kthread_run in order to improve i-cache usage.
The proposed patch has been tested running xdp_redirect_cpu bpf sample
available in the kernel tree that is used to redirect UDP frames from
ixgbe driver to a cpumap entry and then to the networking stack. UDP
frames are generated using pktgen. Packets are discarded by the UDP
layer.
$ xdp_redirect_cpu --cpu <cpu> --progname xdp_cpu_map0 --dev <eth>
bpf-next: ~2.35Mpps
bpf-next + cpumap skb-list: ~2.72Mpps
Rename drops counter in kmem_alloc_drops since now it reports just
kmem_cache_alloc_bulk failures
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/c729f83e5d7482d9329e0f165bdbe5adcefd1510.1619169700.git.lorenzo@kernel.org
Daniel Borkmann [Fri, 23 Apr 2021 13:59:55 +0000 (13:59 +0000)]
bpf: Fix propagation of 32 bit unsigned bounds from 64 bit bounds
Similarly as
b02709587ea3 ("bpf: Fix propagation of 32-bit signed bounds
from 64-bit bounds."), we also need to fix the propagation of 32 bit
unsigned bounds from 64 bit counterparts. That is, really only set the
u32_{min,max}_value when /both/ {umin,umax}_value safely fit in 32 bit
space. For example, the register with a umin_value == 1 does /not/ imply
that u32_min_value is also equal to 1, since umax_value could be much
larger than 32 bit subregister can hold, and thus u32_min_value is in
the interval [0,1] instead.
Before fix, invalid tracking result of R2_w=inv1:
[...]
5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0
5: (35) if r2 >= 0x1 goto pc+1
[...] // goto path
7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0
7: (b6) if w2 <= 0x1 goto pc+1
[...] // goto path
9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smin_value=-
9223372036854775807,smax_value=
9223372032559808513,umin_value=1,umax_value=
18446744069414584321,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_max_value=1) R10=fp0
9: (bc) w2 = w2
10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv1 R10=fp0
[...]
After fix, correct tracking result of R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)):
[...]
5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0
5: (35) if r2 >= 0x1 goto pc+1
[...] // goto path
7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0
7: (b6) if w2 <= 0x1 goto pc+1
[...] // goto path
9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smax_value=
9223372032559808513,umax_value=
18446744069414584321,var_off=(0x0; 0xffffffff00000001),s32_min_value=0,s32_max_value=1,u32_max_value=1) R10=fp0
9: (bc) w2 = w2
10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)) R10=fp0
[...]
Thus, same issue as in
b02709587ea3 holds for unsigned subregister tracking.
Also, align __reg64_bound_u32() similarly to __reg64_bound_s32() as done in
b02709587ea3 to make them uniform again.
Fixes:
3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Manfred Paul (@_manfp)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Florent Revest [Tue, 27 Apr 2021 11:29:58 +0000 (13:29 +0200)]
bpf: Lock bpf_trace_printk's tmp buf before it is written to
bpf_trace_printk uses a shared static buffer to hold strings before they
are printed. A recent refactoring moved the locking of that buffer after
it gets filled by mistake.
Fixes:
d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf")
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210427112958.773132-1-revest@chromium.org
Alexei Starovoitov [Tue, 27 Apr 2021 01:37:14 +0000 (18:37 -0700)]
Merge branch 'CO-RE relocation selftests fixes'
Andrii Nakryiko says:
====================
Lorenz Bauer noticed that core_reloc selftest has two inverted CHECK()
conditions, allowing failing tests to pass unnoticed. Fixing that opened up
few long-standing (field existence and direct memory bitfields) and one recent
failures (BTF_KIND_FLOAT relos).
This patch set fixes core_reloc selftest to capture such failures reliably in
the future. It also fixes all the newly failing tests. See individual patches
for details.
This patch set also completes a set of ASSERT_xxx() macros, so now there
should be a very little reason to use verbose and error-prone generic CHECK()
macro.
v1->v2:
- updated bpf_core_fields_are_compat() comment to mention FLOAT (Lorenz).
Cc: Lorenz Bauer <lmb@cloudflare.com>
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Mon, 26 Apr 2021 19:29:49 +0000 (12:29 -0700)]
selftests/bpf: Fix core_reloc test runner
Fix failed tests checks in core_reloc test runner, which allowed failing tests
to pass quietly. Also add extra check to make sure that expected to fail test cases with
invalid names are caught as test failure anyway, as this is not an expected
failure mode. Also fix mislabeled probed vs direct bitfield test cases.
Fixes:
124a892d1c41 ("selftests/bpf: Test TYPE_EXISTS and TYPE_SIZE CO-RE relocations")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-6-andrii@kernel.org
Andrii Nakryiko [Mon, 26 Apr 2021 19:29:48 +0000 (12:29 -0700)]
selftests/bpf: Fix field existence CO-RE reloc tests
Negative field existence cases for have a broken assumption that FIELD_EXISTS
CO-RE relo will fail for fields that match the name but have incompatible type
signature. That's not how CO-RE relocations generally behave. Types and fields
that match by name but not by expected type are treated as non-matching
candidates and are skipped. Error later is reported if no matching candidate
was found. That's what happens for most relocations, but existence relocations
(FIELD_EXISTS and TYPE_EXISTS) are more permissive and they are designed to
return 0 or 1, depending if a match is found. This allows to handle
name-conflicting but incompatible types in BPF code easily. Combined with
___flavor suffixes, it's possible to handle pretty much any structural type
changes in kernel within the compiled once BPF source code.
So, long story short, negative field existence test cases are invalid in their
assumptions, so this patch reworks them into a single consolidated positive
case that doesn't match any of the fields.
Fixes:
c7566a69695c ("selftests/bpf: Add field existence CO-RE relocs tests")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-5-andrii@kernel.org
Andrii Nakryiko [Mon, 26 Apr 2021 19:29:47 +0000 (12:29 -0700)]
selftests/bpf: Fix BPF_CORE_READ_BITFIELD() macro
Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable
bitfields. Missing breaks in a switch caused 8-byte reads always. This can
confuse libbpf because it does strict checks that memory load size corresponds
to the original size of the field, which in this case quite often would be
wrong.
After fixing that, we run into another problem, which quite subtle, so worth
documenting here. The issue is in Clang optimization and CO-RE relocation
interactions. Without that asm volatile construct (also known as
barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and
will apply BYTE_OFFSET 4 times for each switch case arm. This will result in
the same error from libbpf about mismatch of memory load size and original
field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *),
*(u32 *), and *(u64 *) memory loads, three of which will fail. Using
barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to
calculate p, after which value of p is used without relocation in each of
switch case arms, doing appropiately-sized memory load.
Here's the list of relevant relocations and pieces of generated BPF code
before and after this patch for test_core_reloc_bitfields_direct selftests.
BEFORE
=====
#45: core_reloc: insn #160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32
#46: core_reloc: insn #167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
#47: core_reloc: insn #174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
#48: core_reloc: insn #178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
#49: core_reloc: insn #182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
157: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
159: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1
160: b7 02 00 00 04 00 00 00 r2 = 4
; BYTE_SIZE relocation here ^^^
161: 66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63>
162: 16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65>
163: 16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66>
164: 05 00 12 00 00 00 00 00 goto +18 <LBB0_69>
0000000000000528 <LBB0_66>:
165: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
167: 69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8)
; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^
168: 05 00 0e 00 00 00 00 00 goto +14 <LBB0_69>
0000000000000548 <LBB0_63>:
169: 16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67>
170: 16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68>
171: 05 00 0b 00 00 00 00 00 goto +11 <LBB0_69>
0000000000000560 <LBB0_68>:
172: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
174: 79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8)
; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^
175: 05 00 07 00 00 00 00 00 goto +7 <LBB0_69>
0000000000000580 <LBB0_65>:
176: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
178: 71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8)
; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^
179: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69>
00000000000005a0 <LBB0_67>:
180: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
182: 61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8)
; BYTE_OFFSET relo here w/ RIGHT size ^^^^^^^^^^^^^^^^
00000000000005b8 <LBB0_69>:
183: 67 01 00 00 20 00 00 00 r1 <<= 32
184: b7 02 00 00 00 00 00 00 r2 = 0
185: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71>
186: c7 01 00 00 20 00 00 00 r1 s>>= 32
187: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72>
00000000000005e0 <LBB0_71>:
188: 77 01 00 00 20 00 00 00 r1 >>= 32
AFTER
=====
#30: core_reloc: insn #132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
#31: core_reloc: insn #134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32
129: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
131: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1
132: b7 01 00 00 08 00 00 00 r1 = 8
; BYTE_OFFSET relo here ^^^
; no size check for non-memory dereferencing instructions
133: 0f 12 00 00 00 00 00 00 r2 += r1
134: b7 03 00 00 04 00 00 00 r3 = 4
; BYTE_SIZE relocation here ^^^
135: 66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63>
136: 16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65>
137: 16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66>
138: 05 00 0a 00 00 00 00 00 goto +10 <LBB0_69>
0000000000000458 <LBB0_66>:
139: 69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0)
; NO CO-RE relocation here ^^^^^^^^^^^^^^^^
140: 05 00 08 00 00 00 00 00 goto +8 <LBB0_69>
0000000000000468 <LBB0_63>:
141: 16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67>
142: 16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68>
143: 05 00 05 00 00 00 00 00 goto +5 <LBB0_69>
0000000000000480 <LBB0_68>:
144: 79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0)
; NO CO-RE relocation here ^^^^^^^^^^^^^^^^
145: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69>
0000000000000490 <LBB0_65>:
146: 71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0)
; NO CO-RE relocation here ^^^^^^^^^^^^^^^^
147: 05 00 01 00 00 00 00 00 goto +1 <LBB0_69>
00000000000004a0 <LBB0_67>:
148: 61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0)
; NO CO-RE relocation here ^^^^^^^^^^^^^^^^
00000000000004a8 <LBB0_69>:
149: 67 01 00 00 20 00 00 00 r1 <<= 32
150: b7 02 00 00 00 00 00 00 r2 = 0
151: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71>
152: c7 01 00 00 20 00 00 00 r1 s>>= 32
153: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72>
00000000000004d0 <LBB0_71>:
154: 77 01 00 00 20 00 00 00 r1 >>= 323
Fixes:
ee26dade0e3b ("libbpf: Add support for relocatable bitfields")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-4-andrii@kernel.org
Andrii Nakryiko [Mon, 26 Apr 2021 19:29:46 +0000 (12:29 -0700)]
libbpf: Support BTF_KIND_FLOAT during type compatibility checks in CO-RE
Add BTF_KIND_FLOAT support when doing CO-RE field type compatibility check.
Without this, relocations against float/double fields will fail.
Also adjust one error message to emit instruction index instead of less
convenient instruction byte offset.
Fixes:
22541a9eeb0d ("libbpf: Add BTF_KIND_FLOAT support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-3-andrii@kernel.org
Andrii Nakryiko [Mon, 26 Apr 2021 19:29:45 +0000 (12:29 -0700)]
selftests/bpf: Add remaining ASSERT_xxx() variants
Add ASSERT_TRUE/ASSERT_FALSE for conditions calculated with custom logic to
true/false. Also add remaining arithmetical assertions:
- ASSERT_LE -- less than or equal;
- ASSERT_GT -- greater than;
- ASSERT_GE -- greater than or equal.
This should cover most scenarios where people fall back to error-prone
CHECK()s.
Also extend ASSERT_ERR() to print out errno, in addition to direct error.
Also convert few CHECK() instances to ensure new ASSERT_xxx() variants work as
expected. Subsequent patch will also use ASSERT_TRUE/ASSERT_FALSE more
extensively.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-2-andrii@kernel.org
Alexei Starovoitov [Mon, 26 Apr 2021 04:09:03 +0000 (21:09 -0700)]
Merge branch 'bpf: Tracing and lsm programs re-attach'
Jiri Olsa says:
====================
hi,
while adding test for pinning the module while there's
trampoline attach to it, I noticed that we don't allow
link detach and following re-attach for trampolines.
Adding that for tracing and lsm programs.
You need to have patch [1] from bpf tree for test module
attach test to pass.
v5 changes:
- fixed missing hlist_del_init change
- fixed several ASSERT calls
- added extra patch for missing ';'
- added ASSERT macros to lsm test
- added acks
thanks,
jirka
[1] https://lore.kernel.org/bpf/
20210326105900.151466-1-jolsa@kernel.org/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiri Olsa [Wed, 14 Apr 2021 19:51:47 +0000 (21:51 +0200)]
selftests/bpf: Use ASSERT macros in lsm test
Replacing CHECK with ASSERT macros.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-8-jolsa@kernel.org
Jiri Olsa [Wed, 14 Apr 2021 19:51:46 +0000 (21:51 +0200)]
selftests/bpf: Test that module can't be unloaded with attached trampoline
Adding test to verify that once we attach module's trampoline,
the module can't be unloaded.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-7-jolsa@kernel.org
Jiri Olsa [Wed, 14 Apr 2021 19:51:45 +0000 (21:51 +0200)]
selftests/bpf: Add re-attach test to lsm test
Adding the test to re-attach (detach/attach again) lsm programs,
plus check that already linked program can't be attached again.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-6-jolsa@kernel.org
Jiri Olsa [Wed, 14 Apr 2021 19:51:44 +0000 (21:51 +0200)]
selftests/bpf: Add re-attach test to fexit_test
Adding the test to re-attach (detach/attach again) tracing
fexit programs, plus check that already linked program can't
be attached again.
Also switching to ASSERT* macros.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-5-jolsa@kernel.org
Jiri Olsa [Wed, 14 Apr 2021 19:51:43 +0000 (21:51 +0200)]
selftests/bpf: Add re-attach test to fentry_test
Adding the test to re-attach (detach/attach again) tracing
fentry programs, plus check that already linked program can't
be attached again.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-4-jolsa@kernel.org
Jiri Olsa [Wed, 14 Apr 2021 19:51:41 +0000 (21:51 +0200)]
bpf: Allow trampoline re-attach for tracing and lsm programs
Currently we don't allow re-attaching of trampolines. Once
it's detached, it can't be re-attach even when the program
is still loaded.
Adding the possibility to re-attach the loaded tracing and
lsm programs.
Fixing missing unlock with proper cleanup goto jump reported
by Julia.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-2-jolsa@kernel.org
David S. Miller [Mon, 26 Apr 2021 01:37:39 +0000 (18:37 -0700)]
Merge branch 'bnxt_en-next'
Michael Chan says:
====================
bnxt_en: Updates for net-next.
This series includes these main enhancements:
1. Link related changes
- add NRZ/PAM4 link signal mode to the link up message if known
- rely on firmware to bring down the link during ifdown
2. SRIOV related changes
- allow VF promiscuous mode if the VF is trusted
- allow ndo operations to configure VF when the PF is ifdown
- fix the scenario of the VF taking back control of it's MAC address
- add Hyper-V VF device IDs
3. Support the option to transmit without FCS/CRC.
4. Implement .ndo_features_check() to disable offload when the UDP
encap. packets are not supported.
v2: Patch10: Reverse the check for supported UDP ports to be more straight
forward.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 25 Apr 2021 17:45:27 +0000 (13:45 -0400)]
bnxt_en: Implement .ndo_features_check().
For UDP encapsultions, we only support the offloaded Vxlan port and
Geneve port. All other ports included FOU and GUE are not supported so
we need to turn off TSO and checksum features.
v2: Reverse the check for supported UDP ports to be more straight forward.
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 25 Apr 2021 17:45:26 +0000 (13:45 -0400)]
bnxt_en: Support IFF_SUPP_NOFCS feature to transmit without ethernet FCS.
If firmware is capable, set the IFF_SUPP_NOFCS flag to support the
sockets option to transmit packets without FCS. This is mainly used
for testing.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 25 Apr 2021 17:45:25 +0000 (13:45 -0400)]
bnxt_en: Add PCI IDs for Hyper-V VF devices.
Support VF device IDs used by the Hyper-V hypervisor.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 25 Apr 2021 17:45:24 +0000 (13:45 -0400)]
bnxt_en: Call bnxt_approve_mac() after the PF gives up control of the VF MAC.
When the PF is no longer enforcing an assigned MAC address on a VF, the
VF needs to call bnxt_approve_mac() to tell the PF what MAC address it is
now using. Otherwise it gets out of sync and the PF won't know what
MAC address the VF wants to use. Ultimately the VF will fail when it
tries to setup the L2 MAC filter for the vnic.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 25 Apr 2021 17:45:23 +0000 (13:45 -0400)]
bnxt_en: Move bnxt_approve_mac().
Move it before bnxt_update_vf_mac(). In the next patch, we need to call
bnxt_approve_mac() from bnxt_update_mac() under some conditions. This
will avoid forward declaration.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Sun, 25 Apr 2021 17:45:22 +0000 (13:45 -0400)]
bnxt_en: allow VF config ops when PF is closed
It is perfectly legal for the stack to query and configure VFs via PF
NDOs while the NIC is administratively down. Remove the unnecessary
check for the PF to be in open state.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Sun, 25 Apr 2021 17:45:21 +0000 (13:45 -0400)]
bnxt_en: allow promiscuous mode for trusted VFs
Firmware previously only allowed promiscuous mode for VFs associated with
a default VLAN. It is now possible to enable promiscuous mode for a VF
having no VLAN configured provided that it is trusted. In such cases the
VF will see all packets received by the PF, irrespective of destination
MAC or VLAN.
Note, it is necessary to query firmware at the time of bnxt_promisc_ok()
instead of in bnxt_hwrm_func_qcfg() because the trusted status might be
altered by the PF after the VF has been configured. This check must now
also be deferred because the firmware call sleeps.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 25 Apr 2021 17:45:20 +0000 (13:45 -0400)]
bnxt_en: Add support for fw managed link down feature.
In the current code, the driver will not shutdown the link during
IFDOWN if there are still VFs sharing the port. Newer firmware will
manage the link down decision when the port is shared by VFs, so
we can just call firmware to shutdown the port unconditionally and
let firmware make the final decision.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 25 Apr 2021 17:45:19 +0000 (13:45 -0400)]
bnxt_en: Add a new phy_flags field to the main driver structure.
Copy the phy related feature flags from the firmware call
HWRM_PORT_PHY_QCAPS to this new field. We can also remove the flags
field in the bnxt_test_info structure. It's cleaner to have all PHY
related flags in one location, directly copied from the firmware.
To keep the BNXT_PHY_CFG_ABLE() macro logic the same, we need to make
a slight adjustment to check that it is a PF.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Sun, 25 Apr 2021 17:45:18 +0000 (13:45 -0400)]
bnxt_en: report signal mode in link up messages
Firmware reports link signalling mode for certain speeds. In these
cases, print the signalling modes in kernel log link up messages.
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jethro Beekman [Sun, 25 Apr 2021 09:22:03 +0000 (11:22 +0200)]
macvlan: Add nodst option to macvlan type source
The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.
This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.
v2: netdev wants non-standard line length
Signed-off-by: Jethro Beekman <kernel@jbeekman.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 26 Apr 2021 01:31:35 +0000 (18:31 -0700)]
Merge tag 'mlx5-updates-2021-04-21' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-04-21
devlink external port attribute for SF (Sub-Function) port flavour
This adds the support to instantiate Sub-Functions on external hosts
E.g when Eswitch manager is enabled on the ARM SmarNic SoC CPU, users
are now able to spawn new Sub-Functions on the Host server CPU.
Parav Pandit Says:
==================
This series introduces and uses external attribute for the SF port to
indicate that a SF port belongs to an external controller.
This is needed to generate unique phys_port_name when PF and SF numbers
are overlapping between local and external controllers.
For example two controllers 0 and 1, both of these controller have a SF.
having PF number 0, SF number 77. Here, phys_port_name has duplicate
entry which doesn't have controller number in it.
Hence, add controller number optionally when a SF port is for an
external controller. This extension is similar to existing PF and VF
eswitch ports of the external controller.
When a SF is for external controller an example view of external SF
port and config sequence:
On eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev
$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
phys_port_name construction:
$ cat /sys/class/net/eth1/phys_port_name
c1pf0sf77
Patch summary:
First 3 patches prepares the eswitch to handle vports in more generic
way using xarray to lookup vport from its unique vport number.
Patch-1 returns maximum eswitch ports only when eswitch is enabled
Patch-2 prepares eswitch to return eswitch max ports from a struct
Patch-3 uses xarray for vport and representor lookup
Patch-4 considers SF for an additioanl range of SF vports
Patch-5 relies on SF hw table to check SF support
Patch-6 extends SF devlink port attribute for external flag
Patch-7 stores the per controller SF allocation attributes
Patch-8 uses SF function id for filtering events
Patch-9 uses helper for allocation and free
Patch-10 splits hw table into per controller table and generic one
Patch-11 extends sf table for additional range
==================
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Sun, 25 Apr 2021 00:30:38 +0000 (02:30 +0200)]
net: ethernet: ixp4xx: Support device tree probing
This adds device tree probing to the IXP4xx ethernet
driver.
Add a platform data bool to tell us whether to
register an MDIO bus for the device or not, as well
as the corresponding NPE.
We need to drop the memory region request as part of
this since the OF core will request the memory for the
device.
Cc: Zoltan HERPAI <wigyori@uid0.hu>
Cc: Raylynn Knight <rayknight@me.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Sun, 25 Apr 2021 00:30:37 +0000 (02:30 +0200)]
net: ethernet: ixp4xx: Retire ancient phy retrieveal
This driver was using a really dated way of obtaining the
phy by printing a string and using it with phy_connect().
Switch to using more reasonable modern interfaces.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Sun, 25 Apr 2021 00:30:36 +0000 (02:30 +0200)]
net: ethernet: ixp4xx: Add DT bindings
This adds device tree bindings for the IXP4xx ethernet
controller with optional MDIO bridge.
Cc: Zoltan HERPAI <wigyori@uid0.hu>
Cc: Raylynn Knight <rayknight@me.com>
Cc: devicetree@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Sat, 24 Apr 2021 06:09:03 +0000 (14:09 +0800)]
r8152: remove some bit operations
Remove DELL_TB_RX_AGG_BUG and LENOVO_MACPASSTHRU flags of rtl8152_flags.
They are only set when initializing and wouldn't be change. It is enough
to record them with variables.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dexuan Cui [Sat, 24 Apr 2021 01:12:35 +0000 (18:12 -0700)]
hv_netvsc: Make netvsc/VF binding check both MAC and serial number
Currently the netvsc/VF binding logic only checks the PCI serial number.
The Microsoft Azure Network Adapter (MANA) supports multiple net_device
interfaces (each such interface is called a "vPort", and has its unique
MAC address) which are backed by the same VF PCI device, so the binding
logic should check both the MAC address and the PCI serial number.
The change should not break any other existing VF drivers, because
Hyper-V NIC SR-IOV implementation requires the netvsc network
interface and the VF network interface have the same MAC address.
Co-developed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Co-developed-by: Shachar Raindel <shacharr@microsoft.com>
Signed-off-by: Shachar Raindel <shacharr@microsoft.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiapeng Chong [Fri, 23 Apr 2021 09:52:23 +0000 (17:52 +0800)]
ch_ktls: Remove redundant variable result
Variable result is being assigned a value from a calculation
however the variable is never read, so this redundant variable
can be removed.
Cleans up the following clang-analyzer warning:
drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:1488:2:
warning: Value stored to 'pos' is never read
[clang-analyzer-deadcode.DeadStores].
drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:876:3:
warning: Value stored to 'pos' is never read
[clang-analyzer-deadcode.DeadStores].
drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:36:3:
warning: Value stored to 'start' is never read
[clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 26 Apr 2021 01:02:32 +0000 (18:02 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-04-23
The following pull-request contains BPF updates for your *net-next* tree.
We've added 69 non-merge commits during the last 22 day(s) which contain
a total of 69 files changed, 3141 insertions(+), 866 deletions(-).
The main changes are:
1) Add BPF static linker support for extern resolution of global, from Andrii.
2) Refine retval for bpf_get_task_stack helper, from Dave.
3) Add a bpf_snprintf helper, from Florent.
4) A bunch of miscellaneous improvements from many developers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Parav Pandit [Mon, 8 Mar 2021 11:19:53 +0000 (13:19 +0200)]
net/mlx5: SF, Extend SF table for additional SF id range
Extended the SF table to cover additioanl SF id range of external
controller.
A user optionallly provides the external controller number when user
wants to create SF on the external controller.
An example on eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev
$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Fri, 5 Mar 2021 08:06:06 +0000 (10:06 +0200)]
net/mlx5: SF, Split mlx5_sf_hw_table into two parts
Device has SF ids in two different contiguous ranges. One for the local
controller and second for the external controller's PF.
Each such range has its own maximum number of functions and base id.
To allocate SF from either of the range, prepare code to split into
range specific fields into its own structure.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Fri, 5 Mar 2021 07:35:21 +0000 (09:35 +0200)]
net/mlx5: SF, Use helpers for allocation and free
Use helper routines for SF id and SF table allocation and free
so that subsequent patch can reuse it for multiple SF function
id range.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Thu, 11 Mar 2021 11:51:38 +0000 (13:51 +0200)]
net/mlx5: SF, Consider own vhca events of SF devices
Vhca events on eswitch manager are received for all the functions on the
NIC, including for SFs of external host PF controllers.
While SF device handler is only interested in SF devices events related
to its own PF.
Hence, validate if the function belongs to self or not.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Fri, 5 Mar 2021 06:51:10 +0000 (08:51 +0200)]
net/mlx5: SF, Store and use start function id
SF ids in the device are in two different contiguous ranges. One for
the local controller and second for the external host controller.
Prepare code to handle multiple start function id by storing it in the
table.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Wed, 10 Mar 2021 13:35:03 +0000 (15:35 +0200)]
devlink: Extend SF port attributes to have external attribute
Extended SF port attributes to have optional external flag similar to
PCI PF and VF port attributes.
External atttibute is required to generate unique phys_port_name when PF number
and SF number are overlapping between two controllers similar to SR-IOV
VFs.
When a SF is for external controller an example view of external SF
port and config sequence.
On eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev
$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
phys_port_name construction:
$ cat /sys/class/net/eth1/phys_port_name
c1pf0sf77
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Mon, 8 Mar 2021 09:18:53 +0000 (11:18 +0200)]
net/mlx5: SF, Rely on hw table for SF devlink port allocation
Supporting SF allocation is currently checked at two places:
(a) SF devlink port allocator and
(b) SF HW table handler.
Both layers are using HCA CAP to identify it using helper routine
mlx5_sf_supported() and mlx5_sf_max_functions().
Instead, rely on the HW table handler to check if SF is supported
or not.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Tue, 2 Mar 2021 12:20:21 +0000 (14:20 +0200)]
net/mlx5: E-Switch, Consider SF ports of host PF
Query SF vports count and base id of host PF from the firmware.
Account these ports in the total port calculation whenever it is non
zero.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Fri, 19 Mar 2021 03:21:31 +0000 (05:21 +0200)]
net/mlx5: E-Switch, Use xarray for vport number to vport and rep mapping
Currently vport number to vport and its representor are mapped using an
array and an index.
Vport numbers of different types of functions are not contiguous. Adding
new such discontiguous range using index and number mapping is increasingly
complex and hard to maintain.
Hence, maintain an xarray of vport and rep whose lookup is done based on
the vport number.
Each VF and SF entry is marked with a xarray mark to identify the function
type. Additionally PF and VF needs special handling for legacy inline
mode. They are additionally marked as host function using additional
HOST_FN mark.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Tue, 2 Mar 2021 12:10:49 +0000 (14:10 +0200)]
net/mlx5: E-Switch, Prepare to return total vports from eswitch struct
Total vports are already stored during eswitch initialization. Instead
of calculating everytime, read directly from eswitch.
Additionally, host PF's SF vport information is available using
QUERY_HCA_CAP command. It is not available through HCA_CAP of the
eswitch manager PF.
Hence, this patch prepares the return total eswitch vport count from the
existing eswitch struct.
This further helps to keep eswitch port counting macros and logic within
eswitch.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Tue, 2 Mar 2021 11:54:42 +0000 (13:54 +0200)]
net/mlx5: E-Switch, Return eswitch max ports when eswitch is supported
mlx5_eswitch_get_total_vports() doesn't honor MLX5_ESWICH Kconfig flag.
When MLX5_ESWITCH is disabled, FS layer continues to initialize eswitch
specific ACL namespaces.
Instead, start honoring MLX5_ESWITCH flag and perform vport specific
initialization only when vport count is non zero.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tiezhu Yang [Fri, 23 Apr 2021 01:23:30 +0000 (09:23 +0800)]
bpf: Document the pahole release info related to libbpf in bpf_devel_QA.rst
pahole starts to use libbpf definitions and APIs since v1.13 after the
commit
21507cd3e97b ("pahole: add libbpf as submodule under lib/bpf").
It works well with the git repository because the libbpf submodule will
use "git submodule update --init --recursive" to update.
Unfortunately, the default github release source code does not contain
libbpf submodule source code and this will cause build issues, the tarball
from https://git.kernel.org/pub/scm/devel/pahole/pahole.git/ is same with
github, you can get the source tarball with corresponding libbpf submodule
codes from
https://fedorapeople.org/~acme/dwarves
This change documents the above issues to give more information so that
we can get the tarball from the right place, early discussion is here:
https://lore.kernel.org/bpf/
2de4aad5-fa9e-1c39-3c92-
9bb9229d0966@loongson.cn/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/bpf/1619141010-12521-1-git-send-email-yangtiezhu@loongson.cn
Radu Pirea (NXP OSS) [Fri, 23 Apr 2021 15:00:50 +0000 (18:00 +0300)]
phy: nxp-c45-tja11xx: add interrupt support
Added .config_intr and .handle_interrupt callbacks.
Link event interrupt will trigger an interrupt every time when the link
goes up or down.
Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 23 Apr 2021 13:28:36 +0000 (14:28 +0100)]
net/atm: Fix spelling mistake "requed" -> "requeued"
There is a spelling mistake in a printk message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Po-Hsu Lin [Fri, 23 Apr 2021 11:15:38 +0000 (19:15 +0800)]
selftests/net: bump timeout to 5 minutes
We found that with the latest mainline kernel (5.12.0-051200rc8) on
some KVM instances / bare-metal systems, the following tests will take
longer than the kselftest framework default timeout (45 seconds) to
run and thus got terminated with TIMEOUT error:
* xfrm_policy.sh - took about 2m20s
* pmtu.sh - took about 3m5s
* udpgso_bench.sh - took about 60s
Bump the timeout setting to 5 minutes to allow them have a chance to
finish.
https://bugs.launchpad.net/bugs/1856010
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Apr 2021 21:06:32 +0000 (14:06 -0700)]
Merge branch 'mptcp-msg-flags'
Mat Martineau says:
====================
mptcp: Compatibility with common msg flags
These patches from the MPTCP tree handle some of the msg flags that are
typically used with TCP, to make it easier to adapt userspace programs
for use with MPTCP.
Patches 1, 2, and 4 add support for MSG_ERRQUEUE (no-op for now),
MSG_TRUNC, and MSG_PEEK on the receive side.
Patch 3 ignores unsupported msg flags for send and receive.
Patch 5 adds a selftest for MSG_PEEK.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Li [Fri, 23 Apr 2021 18:17:09 +0000 (11:17 -0700)]
selftests: mptcp: add a test case for MSG_PEEK
Extend mptcp_connect tool with MSG_PEEK support and add a test case in
mptcp_connect.sh that checks the data received from/after recv() with
MSG_PEEK.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Li [Fri, 23 Apr 2021 18:17:08 +0000 (11:17 -0700)]
mptcp: add MSG_PEEK support
This patch adds support for MSG_PEEK flag. Packets are not removed
from the receive_queue if MSG_PEEK set in recv() system call.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 23 Apr 2021 18:17:07 +0000 (11:17 -0700)]
mptcp: ignore unsupported msg flags
Currently mptcp_sendmsg() fails with EOPNOTSUPP if the
user-space provides some unsupported flag. That is unexpected
and may foul existing applications migrated to MPTCP, which
expect a different behavior.
Change the mentioned function to silently ignore the unsupported
flags except MSG_FASTOPEN. This is the only flags currently not
supported by MPTCP with user-space visible side-effects.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/162
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 23 Apr 2021 18:17:06 +0000 (11:17 -0700)]
mptcp: implement MSG_TRUNC support
The mentioned flag is currently silenlty ignored. This
change implements the TCP-like behaviour, dropping the
pending data up to the specified length.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Sigend-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 23 Apr 2021 18:17:05 +0000 (11:17 -0700)]
mptcp: implement dummy MSG_ERRQUEUE support
mptcp_recvmsg() currently silently ignores MSG_ERRQUEUE, returning
input data instead of error cmsg.
This change provides a dummy implementation for MSG_ERRQUEUE - always
returns no data. That is consistent with the current lack of a suitable
IP_RECVERR setsockopt() support.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Fri, 23 Apr 2021 21:05:28 +0000 (14:05 -0700)]
Merge branch 'BPF static linker: support externs'
Andrii Nakryiko says:
====================
Add BPF static linker support for extern resolution of global variables,
functions, and BTF-defined maps.
This patch set consists of 4 parts:
- few patches are extending bpftool to simplify working with BTF dump;
- libbpf object loading logic is extended to support __hidden functions and
overriden (unused) __weak functions; also BTF-defined map parsing logic is
refactored to be re-used by linker;
- the crux of the patch set is BPF static linker logic extension to perform
extern resolution for three categories: global variables, BPF
sub-programs, BTF-defined maps;
- a set of selftests that validate that all the combinations of
extern/weak/__hidden are working as expected.
See respective patches for more details.
One aspect hasn't been addressed yet and is going to be resolved in the next
patch set, but is worth mentioning. With BPF static linking of multiple .o
files, dealing with static everything becomes more problematic for BPF
skeleton and in general for any by name look up APIs. This is due to static
entities are allowed to have non-unique name. Historically this was never
a problem due to BPF programs were always confined to a single C file. That
changes now and needs to be addressed. The thinking so far is for BPF static
linker to prepend filename to each static variable and static map (which is
currently not supported by libbpf, btw), so that they can be unambiguously
resolved by (mostly) unique name. Mostly, because even filenames can be
duplicated, but that should be rare and easy to address by wiser choice of
filenames by users. Fortunately, static BPF subprograms don't suffer from this
issues, as they are not independent entities and are neither exposed in BPF
skeleton, nor is lookup-able by any of libbpf APIs (and there is little reason
to do that anyways).
This and few other things will be the topic of the next set of patches.
Some tests rely on Clang fix ([0]), so need latest Clang built from main.
[0] https://reviews.llvm.org/D100362
v2->v3:
- allow only STV_DEFAULT and STV_HIDDEN ELF symbol visibility (Yonghong);
- update selftests' README for required Clang 13 fix dependency (Alexei);
- comments, typos, slight code changes (Yonghong, Alexei);
v1->v2:
- make map externs support full attribute list, adjust linked_maps selftest
to demonstrate that typedef works now (though no shared header file was
added to simplicity sake) (Alexei);
- remove commented out parts from selftests and fix few minor code style
issues;
- special __weak map definition semantics not yet implemented and will be
addressed in a follow up.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:48 +0000 (11:13 -0700)]
selftests/bpf: Document latest Clang fix expectations for linking tests
Document which fixes are required to generate correct static linking
selftests.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-19-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:47 +0000 (11:13 -0700)]
selftests/bpf: Add map linking selftest
Add selftest validating various aspects of statically linking BTF-defined map
definitions. Legacy map definitions do not support extern resolution between
object files. Some of the aspects validated:
- correct resolution of extern maps against concrete map definitions;
- extern maps can currently only specify map type and key/value size and/or
type information;
- weak concrete map definitions are resolved properly.
Static map definitions are not yet supported by libbpf, so they are not
explicitly tested, though manual testing showes that BPF linker handles them
properly.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-18-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:46 +0000 (11:13 -0700)]
selftests/bpf: Add global variables linking selftest
Add selftest validating various aspects of statically linking global
variables:
- correct resolution of extern variables across .bss, .data, and .rodata
sections;
- correct handling of weak definitions;
- correct de-duplication of repeating special externs (.kconfig, .ksyms).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-17-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:45 +0000 (11:13 -0700)]
selftests/bpf: Add function linking selftest
Add selftest validating various aspects of statically linking functions:
- no conflicts and correct resolution for name-conflicting static funcs;
- correct resolution of extern functions;
- correct handling of weak functions, both resolution itself and libbpf's
handling of unused weak function that "lost" (it leaves gaps in code with
no ELF symbols);
- correct handling of hidden visibility to turn global function into
"static" for the purpose of BPF verification.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-16-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:44 +0000 (11:13 -0700)]
selftests/bpf: Omit skeleton generation for multi-linked BPF object files
Skip generating individual BPF skeletons for files that are supposed to be
linked together to form the final BPF object file. Very often such files are
"incomplete" BPF object files, which will fail libbpf bpf_object__open() step,
if used individually, thus failing BPF skeleton generation. This is by design,
so skip individual BPF skeletons and only validate them as part of their
linked final BPF object file and skeleton.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-15-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:43 +0000 (11:13 -0700)]
selftests/bpf: Use -O0 instead of -Og in selftests builds
While -Og is designed to work well with debugger, it's still inferior to -O0
in terms of debuggability experience. It will cause some variables to still be
inlined, it will also prevent single-stepping some statements and otherwise
interfere with debugging experience. So switch to -O0 which turns off any
optimization and provides the best debugging experience.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-14-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:42 +0000 (11:13 -0700)]
libbpf: Support extern resolution for BTF-defined maps in .maps section
Add extra logic to handle map externs (only BTF-defined maps are supported for
linking). Re-use the map parsing logic used during bpf_object__open(). Map
externs are currently restricted to always match complete map definition. So
all the specified attributes will be compared (down to pining, map_flags,
numa_node, etc). In the future this restriction might be relaxed with no
backwards compatibility issues. If any attribute is mismatched between extern
and actual map definition, linker will report an error, pointing out which one
mismatches.
The original intent was to allow for extern to specify attributes that matters
(to user) to enforce. E.g., if you specify just key information and omit
value, then any value fits. Similarly, it should have been possible to enforce
map_flags, pinning, and any other possible map attribute. Unfortunately, that
means that multiple externs can be only partially overlapping with each other,
which means linker would need to combine their type definitions to end up with
the most restrictive and fullest map definition. This requires an extra amount
of BTF manipulation which at this time was deemed unnecessary and would
require further extending generic BTF writer APIs. So that is left for future
follow ups, if there will be demand for that. But the idea seems intresting
and useful, so I want to document it here.
Weak definitions are also supported, but are pretty strict as well, just
like externs: all weak map definitions have to match exactly. In the follow up
patches this most probably will be relaxed, with __weak map definitions being
able to differ between each other (with non-weak definition always winning, of
course).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-13-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:41 +0000 (11:13 -0700)]
libbpf: Add linker extern resolution support for functions and global variables
Add BPF static linker logic to resolve extern variables and functions across
multiple linked together BPF object files.
For that, linker maintains a separate list of struct glob_sym structures,
which keeps track of few pieces of metadata (is it extern or resolved global,
is it a weak symbol, which ELF section it belongs to, etc) and ties together
BTF type info and ELF symbol information and keeps them in sync.
With adding support for extern variables/funcs, it's now possible for some
sections to contain both extern and non-extern definitions. This means that
some sections may start out as ephemeral (if only externs are present and thus
there is not corresponding ELF section), but will be "upgraded" to actual ELF
section as symbols are resolved or new non-extern definitions are appended.
Additional care is taken to not duplicate extern entries in sections like
.kconfig and .ksyms.
Given libbpf requires BTF type to always be present for .kconfig/.ksym
externs, linker extends this requirement to all the externs, even those that
are supposed to be resolved during static linking and which won't be visible
to libbpf. With BTF information always present, static linker will check not
just ELF symbol matches, but entire BTF type signature match as well. That
logic is stricter that BPF CO-RE checks. It probably should be re-used by
.ksym resolution logic in libbpf as well, but that's left for follow up
patches.
To make it unnecessary to rewrite ELF symbols and minimize BTF type
rewriting/removal, ELF symbols that correspond to externs initially will be
updated in place once they are resolved. Similarly for BTF type info, VAR/FUNC
and var_secinfo's (sec_vars in struct bpf_linker) are staying stable, but
types they point to might get replaced when extern is resolved. This might
leave some left-over types (even though we try to minimize this for common
cases of having extern funcs with not argument names vs concrete function with
names properly specified). That can be addresses later with a generic BTF
garbage collection. That's left for a follow up as well.
Given BTF type appending phase is separate from ELF symbol
appending/resolution, special struct glob_sym->underlying_btf_id variable is
used to communicate resolution and rewrite decisions. 0 means
underlying_btf_id needs to be appended (it's not yet in final linker->btf), <0
values are used for temporary storage of source BTF type ID (not yet
rewritten), so -glob_sym->underlying_btf_id is BTF type id in obj-btf. But by
the end of linker_append_btf() phase, that underlying_btf_id will be remapped
and will always be > 0. This is the uglies part of the whole process, but
keeps the other parts much simpler due to stability of sec_var and VAR/FUNC
types, as well as ELF symbol, so please keep that in mind while reviewing.
BTF-defined maps require some extra custom logic and is addressed separate in
the next patch, so that to keep this one smaller and easier to review.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-12-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:40 +0000 (11:13 -0700)]
libbpf: Tighten BTF type ID rewriting with error checking
It should never fail, but if it does, it's better to know about this rather
than end up with nonsensical type IDs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-11-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:39 +0000 (11:13 -0700)]
libbpf: Extend sanity checking ELF symbols with externs validation
Add logic to validate extern symbols, plus some other minor extra checks, like
ELF symbol #0 validation, general symbol visibility and binding validations.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-10-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:38 +0000 (11:13 -0700)]
libbpf: Make few internal helpers available outside of libbpf.c
Make skip_mods_and_typedefs(), btf_kind_str(), and btf_func_linkage() helpers
available outside of libbpf.c, to be used by static linker code.
Also do few cleanups (error code fixes, comment clean up, etc) that don't
deserve their own commit.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-9-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:37 +0000 (11:13 -0700)]
libbpf: Factor out symtab and relos sanity checks
Factor out logic for sanity checking SHT_SYMTAB and SHT_REL sections into
separate sections. They are already quite extensive and are suffering from too
deep indentation. Subsequent changes will extend SYMTAB sanity checking
further, so it's better to factor each into a separate function.
No functional changes are intended.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-8-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:36 +0000 (11:13 -0700)]
libbpf: Refactor BTF map definition parsing
Refactor BTF-defined maps parsing logic to allow it to be nicely reused by BPF
static linker. Further, at least for BPF static linker, it's important to know
which attributes of a BPF map were defined explicitly, so provide a bit set
for each known portion of BTF map definition. This allows BPF static linker to
do a simple check when dealing with extern map declarations.
The same capabilities allow to distinguish attributes explicitly set to zero
(e.g., __uint(max_entries, 0)) vs the case of not specifying it at all (no
max_entries attribute at all). Libbpf is currently not utilizing that, but it
could be useful for backwards compatibility reasons later.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-7-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:35 +0000 (11:13 -0700)]
libbpf: Allow gaps in BPF program sections to support overriden weak functions
Currently libbpf is very strict about parsing BPF program instruction
sections. No gaps are allowed between sequential BPF programs within a given
ELF section. Libbpf enforced that by keeping track of the next section offset
that should start a new BPF (sub)program and cross-checks that by searching
for a corresponding STT_FUNC ELF symbol.
But this is too restrictive once we allow to have weak BPF programs and link
together two or more BPF object files. In such case, some weak BPF programs
might be "overridden" by either non-weak BPF program with the same name and
signature, or even by another weak BPF program that just happened to be linked
first. That, in turn, leaves BPF instructions of the "lost" BPF (sub)program
intact, but there is no corresponding ELF symbol, because no one is going to
be referencing it.
Libbpf already correctly handles such cases in the sense that it won't append
such dead code to actual BPF programs loaded into kernel. So the only change
that needs to be done is to relax the logic of parsing BPF instruction
sections. Instead of assuming next BPF (sub)program section offset, iterate
available STT_FUNC ELF symbols to discover all available BPF subprograms and
programs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-6-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:34 +0000 (11:13 -0700)]
libbpf: Mark BPF subprogs with hidden visibility as static for BPF verifier
Define __hidden helper macro in bpf_helpers.h, which is a short-hand for
__attribute__((visibility("hidden"))). Add libbpf support to mark BPF
subprograms marked with __hidden as static in BTF information to enforce BPF
verifier's static function validation algorithm, which takes more information
(caller's context) into account during a subprogram validation.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-5-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:33 +0000 (11:13 -0700)]
libbpf: Suppress compiler warning when using SEC() macro with externs
When used on externs SEC() macro will trigger compilation warning about
inapplicable `__attribute__((used))`. That's expected for extern declarations,
so suppress it with the corresponding _Pragma.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-4-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:32 +0000 (11:13 -0700)]
bpftool: Dump more info about DATASEC members
Dump succinct information for each member of DATASEC: its kinds and name. This
is extremely helpful to see at a quick glance what is inside each DATASEC of
a given BTF. Without this, one has to jump around BTF data to just find out
the name of a VAR or FUNC. DATASEC's var_secinfo member is special in that
regard because it doesn't itself contain the name of the member, delegating
that to the referenced VAR and FUNC kinds. Other kinds, like
STRUCT/UNION/FUNC/ENUM, encode member names directly and thus are clearly
identifiable in BTF dump.
The new output looks like this:
[35] DATASEC '.bss' size=0 vlen=6
type_id=8 offset=0 size=4 (VAR 'input_bss1')
type_id=13 offset=0 size=4 (VAR 'input_bss_weak')
type_id=16 offset=0 size=4 (VAR 'output_bss1')
type_id=17 offset=0 size=4 (VAR 'output_data1')
type_id=18 offset=0 size=4 (VAR 'output_rodata1')
type_id=20 offset=0 size=8 (VAR 'output_sink1')
[36] DATASEC '.data' size=0 vlen=2
type_id=9 offset=0 size=4 (VAR 'input_data1')
type_id=14 offset=0 size=4 (VAR 'input_data_weak')
[37] DATASEC '.kconfig' size=0 vlen=2
type_id=25 offset=0 size=4 (VAR 'LINUX_KERNEL_VERSION')
type_id=28 offset=0 size=1 (VAR 'CONFIG_BPF_SYSCALL')
[38] DATASEC '.ksyms' size=0 vlen=1
type_id=30 offset=0 size=1 (VAR 'bpf_link_fops')
[39] DATASEC '.rodata' size=0 vlen=2
type_id=12 offset=0 size=4 (VAR 'input_rodata1')
type_id=15 offset=0 size=4 (VAR 'input_rodata_weak')
[40] DATASEC 'license' size=0 vlen=1
type_id=24 offset=0 size=4 (VAR 'LICENSE')
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-3-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:31 +0000 (11:13 -0700)]
bpftool: Support dumping BTF VAR's "extern" linkage
Add dumping of "extern" linkage for BTF VAR kind. Also shorten
"global-allocated" to "global" to be in line with FUNC's "global".
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-2-andrii@kernel.org
David S. Miller [Fri, 23 Apr 2021 21:04:43 +0000 (14:04 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2021-04-23
This series contains updates to i40e and iavf drivers.
Aleksandr adds support for VIRTCHNL_VF_CAP_ADV_LINK_SPEED in i40e which
allows for reporting link speed to VF as a value instead of using an
enum; helper functions are created to remove repeated code.
Coiby Xu reduces memory use of i40e when using kdump by reducing Tx, Rx,
and admin queue to minimum values. Current use causes failure of kdump.
Stefan Assmann removes duplicated free calls in iavf.
Haiyue cleans up a loop to return directly when if the value is found
and changes some magic numbers to defines for better maintainability
in iavf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Apr 2021 21:01:28 +0000 (14:01 -0700)]
Merge branch 'mlxsw-selftest-fixes'
Petr Machata says:
====================
selftests: mlxsw: Fixes
This patch set carries fixes to selftest issues that we have hit in our
nightly regression run. Almost all are in mlxsw selftests, though one is in
a generic forwarding selftest.
- In patch #1, in an ERSPAN test, install an FDB entry as static instead of
(implicitly) as local.
- In the mlxsw resource-scale test, an if statement overrides the value of
$?, which is supposed to contain the result of the test. As a result, the
resource scale test can spuriously pass.
In patches #2 and #3, remove the if statements to fix the issue in,
respectively, port_scale test and tc_flower_scale tests.
- Again in the mlxsw resource-scale test, when more then one sub-test is
run, a successful sub-test overrides any previous failures. This causes a
spurious pass of the overall test. This is fixed in patch #4.
- In patch #5, increase a tolerance in a mlxsw-specific RED backlog test.
This test is very noisy, due to rounding errors and the unpredictability
of software traffic generation. By bumping the tolerance from 5 % to 10,
get the failure rate to zero. This shouldn't impact the accuracy,
mistakes in backlog configuration (e.g. due to wrong cell size) are
likely to cause a much larger discrepancy.
- In patch #6, fix mausezahn invocation in the mlxsw ERSPAN scale
test. The test failed because of the wrong invocation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 23 Apr 2021 12:19:48 +0000 (14:19 +0200)]
selftests: mlxsw: Fix mausezahn invocation in ERSPAN scale test
The mirror_gre_scale test creates as many ERSPAN sessions as the underlying
chip supports, and tests that they all work. In order to determine that it
issues a stream of ICMP packets and checks if they are mirrored as
expected.
However, the mausezahn invocation missed the -6 flag to identify the use of
IPv6 protocol, and was sending ICMP messages over IPv6, as opposed to
ICMP6. It also didn't pass an explicit source IP address, which apparently
worked at some point in the past, but does not anymore.
To fix these issues, extend the function mirror_test() in mirror_lib by
detecting the IPv6 protocol addresses, and using a different ICMP scheme.
Fix __mirror_gre_test() in the selftest itself to pass a source IP address.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 23 Apr 2021 12:19:47 +0000 (14:19 +0200)]
selftests: mlxsw: Increase the tolerance of backlog buildup
The intention behind this test is to make sure that qdisc limit is
correctly projected to the HW. However, first, due to rounding in the
qdisc, and then in the driver, the number cannot actually be accurate. And
second, the approach to testing this is to oversubscribe the port with
traffic generated on the same switch. The actual backlog size therefore
fluctuates.
In practice, this test proved to be noisier than the rest, and spuriously
fails every now and then. Increase the tolerance to 10 % to avoid these
issues.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Danielle Ratson [Fri, 23 Apr 2021 12:19:46 +0000 (14:19 +0200)]
selftests: mlxsw: Return correct error code in resource scale tests
Currently, the resource scale test checks a few cases, when the error code
resets between the cases. So for example, if one case fails and the
consecutive case passes, the error code eventually will fit the last test
and will be 0.
Save a new return code that will hold the 'or' return codes of all the
cases, so the final return code will consider all the cases.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Danielle Ratson [Fri, 23 Apr 2021 12:19:45 +0000 (14:19 +0200)]
selftests: mlxsw: Remove a redundant if statement in tc_flower_scale test
Currently, the error return code of the failure condition is lost after
using an if statement, so the test doesn't fail when it should.
Remove the if statement that separates the condition and the error code
check, so the test won't always pass.
Fixes:
abfce9e062021 ("selftests: mlxsw: Reduce running time using offload indication")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Danielle Ratson [Fri, 23 Apr 2021 12:19:44 +0000 (14:19 +0200)]
selftests: mlxsw: Remove a redundant if statement in port_scale test
Currently, the error return code of the failure condition is lost after
using an if statement, so the test doesn't fail when it should.
Remove the if statement that separates the condition and the error code
check, so the test won't always pass.
Fixes:
5154b1b826d9b ("selftests: mlxsw: Add a scale test for physical ports")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 23 Apr 2021 12:19:43 +0000 (14:19 +0200)]
selftests: net: mirror_gre_vlan_bridge_1q: Make an FDB entry static
The FDB roaming test installs a destination MAC address on the wrong
interface of an FDB database and tests whether the mirroring fails, because
packets are sent to the wrong port. The test by mistake installs the FDB
entry as local. This worked previously, because drivers were notified of
local FDB entries in the same way as of static entries. However that has
been fixed in the commit
6ab4c3117aec ("net: bridge: don't notify switchdev
for local FDB addresses"), and local entries are not notified anymore. As a
result, the HW is not reconfigured for the FDB roam, and mirroring keeps
working, failing the test.
To fix the issue, mark the FDB entry as static.
Fixes:
9c7c8a82442c ("selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Apr 2021 20:58:25 +0000 (13:58 -0700)]
Merge tag 'wireless-drivers-next-2021-04-23' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for v5.13
Third, and final, set of patches for v5.13. We got one more week
before the merge window and this includes from that extra week.
Smaller features to rtw88 and mt76, but mostly this contains fixes.
rtw88
* 8822c: Add gap-k calibration to improve long range performance
mt76
* parse rate power limits from DT
* debugfs file to test firmware crash
* debugfs to disable NAPI threaded mode
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Apr 2021 20:55:43 +0000 (13:55 -0700)]
Merge branch 'r8152-adjust-REALTEK_USB_DEVICE'
Hayes Wang says:
====================
r8152: adjust REALTEK_USB_DEVICE
Modify REALTEK_USB_DEVICE macro.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Fri, 23 Apr 2021 09:44:55 +0000 (17:44 +0800)]
r8152: redefine REALTEK_USB_DEVICE macro
Redefine REALTEK_USB_DEVICE macro with USB_DEVICE_INTERFACE_CLASS and
USB_DEVICE_AND_INTERFACE_INFO to simply the code.
Although checkpatch.pl shows the following error, it is more readable.
ERROR: Macros with complex values should be enclosed in parentheses
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Fri, 23 Apr 2021 09:44:54 +0000 (17:44 +0800)]
r8152: remove NCM mode from REALTEK_USB_DEVICE macro
The RTL8156 support CDC NCM mode. And users could set the configuration
of the USB device between vendor and NCM mode dynamically by themselves.
That is, the driver doesn't need to set vendor mode from NCM mode.
Fixes:
195aae321c82 ("r8152: support new chips")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Fri, 23 Apr 2021 09:33:55 +0000 (17:33 +0800)]
enetc: fix locking for one-step timestamping packet transfer
The previous patch to support PTP Sync packet one-step timestamping
described one-step timestamping packet handling logic as below in
commit message:
- Trasmit packet immediately if no other one in transfer, or queue to
skb queue if there is already one in transfer.
The test_and_set_bit_lock() is used here to lock and check state.
- Start a work when complete transfer on hardware, to release the bit
lock and to send one skb in skb queue if has.
There was not problem of the description, but there was a mistake in
implementation. The locking/test_and_set_bit_lock() should be put in
enetc_start_xmit() which may be called by worker, rather than in
enetc_xmit(). Otherwise, the worker calling enetc_start_xmit() after
bit lock released is not able to lock again for transfer.
Fixes:
7294380c5211 ("enetc: support PTP Sync packet one-step timestamping")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Apr 2021 20:37:47 +0000 (13:37 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2021-04-23
1) The SPI flow key in struct flowi has no consumers,
so remove it. From Florian Westphal.
2) Remove stray synchronize_rcu from xfrm_init.
From Florian Westphal.
3) Use the new exit_pre hook to reset the netlink socket
on net namespace destruction. From Florian Westphal.
4) Remove an unnecessary get_cpu() in ipcomp, that
code is always called with BHs off.
From Sabrina Dubroca.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Apr 2021 20:31:58 +0000 (13:31 -0700)]
Merge branch 'mk_eth_soc_fixes-perf-improvements'
Ilya Lipnitskiy says:
====================
mtk_eth_soc: fixes and performance improvements
Most of these changes come from OpenWrt where they have been present and
tested for months.
First three patches are bug fixes. The rest are performance
improvements. The last patch is a cleanup to use the iopoll.h macro for
busy-waiting instead of a custom loop.
v2:
- Reverse christmas tree in "use iopoll.h macro for DMA init"
- Use cond_resched() instead of iopoll.h macro in "reduce MDIO bus
access latency"
- Use napi_complete_done and rework NAPI callbacks in a new patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lipnitskiy [Fri, 23 Apr 2021 05:21:08 +0000 (22:21 -0700)]
net: ethernet: mtk_eth_soc: use iopoll.h macro for DMA init
Replace a tight busy-wait loop without a pause with a standard
readx_poll_timeout_atomic routine with a 5 us poll period.
Tested by booting a MT7621 device to ensure the driver initializes
properly.
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Fri, 23 Apr 2021 05:21:07 +0000 (22:21 -0700)]
net: ethernet: mtk_eth_soc: set PPE flow hash as skb hash if present
This improves GRO performance
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[Ilya: Use MTK_RXD4_FOE_ENTRY instead of GENMASK(13, 0)]
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lipnitskiy [Fri, 23 Apr 2021 05:21:06 +0000 (22:21 -0700)]
net: ethernet: mtk_eth_soc: rework NAPI callbacks
Use napi_complete_done to communicate total TX and RX work done to NAPI.
Count total RX work up instead of remaining work down for clarity.
Remove unneeded local variables for clarity. Use do {} while instead of
goto for clarity.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Fri, 23 Apr 2021 05:21:05 +0000 (22:21 -0700)]
net: ethernet: mtk_eth_soc: reduce unnecessary interrupts
Avoid rearming interrupt if napi_complete returns false
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Fri, 23 Apr 2021 05:21:04 +0000 (22:21 -0700)]
net: ethernet: mtk_eth_soc: only read the full RX descriptor if DMA is done
Uncached memory access is expensive, and there is no need to access all
descriptor words if we can't process them anyway
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>