Wang Hai [Wed, 9 Jun 2021 11:56:51 +0000 (19:56 +0800)]
libbpf: Simplify the return expression of bpf_object__init_maps function
There is no need for special treatment of the 'ret == 0' case.
This patch simplifies the return expression.
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210609115651.3392580-1-wanghai38@huawei.com
Joe Stringer [Tue, 8 Jun 2021 01:57:56 +0000 (18:57 -0700)]
selftests, bpf: Make docs tests fail more reliably
Previously, if rst2man caught errors, then these would be ignored and
the output file would be written anyway. This would allow developers to
introduce regressions in the docs comments in the BPF headers.
Additionally, even if you instruct rst2man to fail out, it will still
write out to the destination target file, so if you ran the tests twice
in a row it would always pass. Use a temporary file for the initial run
to ensure that if rst2man fails out under "--strict" mode, subsequent
runs will not automatically pass.
Tested via ./tools/testing/selftests/bpf/test_doc_build.sh
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20210608015756.340385-1-joe@cilium.io
Michal Suchanek [Fri, 4 Jun 2021 11:24:48 +0000 (13:24 +0200)]
libbpf: Fix pr_warn type warnings on 32bit
The printed value is ptrdiff_t and is formatted wiht %ld. This works on
64bit but produces a warning on 32bit. Fix the format specifier to %td.
Fixes:
67234743736a ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210604112448.32297-1-msuchanek@suse.de
Jean-Philippe Brucker [Thu, 3 Jun 2021 17:05:16 +0000 (19:05 +0200)]
tools/bpftool: Fix cross-build
When the bootstrap and final bpftool have different architectures, we
need to build two distinct disasm.o objects. Add a recipe for the
bootstrap disasm.o.
After commit
d510296d331a ("bpftool: Use syscall/loader program in
"prog load" and "gen skeleton" command.") cross-building bpftool didn't
work anymore, because the bootstrap bpftool was linked using objects
from different architectures:
$ make O=/tmp/bpftool ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -C tools/bpf/bpftool/ V=1
[...]
aarch64-linux-gnu-gcc ... -c -MMD -o /tmp/bpftool/disasm.o /home/z/src/linux/kernel/bpf/disasm.c
gcc ... -c -MMD -o /tmp/bpftool//bootstrap/main.o main.c
gcc ... -o /tmp/bpftool//bootstrap/bpftool /tmp/bpftool//bootstrap/main.o ... /tmp/bpftool/disasm.o
/usr/bin/ld: /tmp/bpftool/disasm.o: Relocations in generic ELF (EM: 183)
/usr/bin/ld: /tmp/bpftool/disasm.o: Relocations in generic ELF (EM: 183)
/usr/bin/ld: /tmp/bpftool/disasm.o: Relocations in generic ELF (EM: 183)
/usr/bin/ld: /tmp/bpftool/disasm.o: error adding symbols: file in wrong format
collect2: error: ld returned 1 exit status
[...]
The final bpftool was built for e.g. arm64, while the bootstrap bpftool,
executed on the host, was built for x86. The problem here was that disasm.o
linked into the bootstrap bpftool was arm64 rather than x86. With the fix
we build two disasm.o, one for the target bpftool in arm64, and one for
the bootstrap bpftool in x86.
Fixes:
d510296d331a ("bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210603170515.1854642-1-jean-philippe@linaro.org
Andrii Nakryiko [Thu, 3 Jun 2021 00:40:26 +0000 (17:40 -0700)]
selftests/bpf: Add xdp_redirect_multi into .gitignore
When xdp_redirect_multi test binary was added recently, it wasn't added to
.gitignore. Fix that.
Fixes:
d23292476297 ("selftests/bpf: Add xdp_redirect_multi test")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-5-andrii@kernel.org
Andrii Nakryiko [Thu, 3 Jun 2021 00:40:25 +0000 (17:40 -0700)]
libbpf: Install skel_internal.h header used from light skeletons
Light skeleton code assumes skel_internal.h header to be installed system-wide
by libbpf package. Make sure it is actually installed.
Fixes:
67234743736a ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-4-andrii@kernel.org
Andrii Nakryiko [Thu, 3 Jun 2021 00:40:24 +0000 (17:40 -0700)]
libbpf: Refactor header installation portions of Makefile
As we gradually get more headers that have to be installed, it's quite
annoying to copy/paste long $(call) commands. So extract that logic and do
a simple $(foreach) over the list of headers.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-3-andrii@kernel.org
Andrii Nakryiko [Thu, 3 Jun 2021 00:40:23 +0000 (17:40 -0700)]
libbpf: Move few APIs from 0.4 to 0.5 version
Official libbpf 0.4 release doesn't include three APIs that were tentatively
put into 0.4 section. Fix libbpf.map and move these three APIs:
- bpf_map__initial_value;
- bpf_map_lookup_and_delete_elem_flags;
- bpf_object__gen_loader.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-2-andrii@kernel.org
Harishankar Vishwanathan [Mon, 31 May 2021 02:01:57 +0000 (22:01 -0400)]
bpf, tnums: Provably sound, faster, and more precise algorithm for tnum_mul
This patch introduces a new algorithm for multiplication of tristate
numbers (tnums) that is provably sound. It is faster and more precise when
compared to the existing method.
Like the existing method, this new algorithm follows the long
multiplication algorithm. The idea is to generate partial products by
multiplying each bit in the multiplier (tnum a) with the multiplicand
(tnum b), and adding the partial products after appropriately bit-shifting
them. The new algorithm, however, uses just a single loop over the bits of
the multiplier (tnum a) and accumulates only the uncertain components of
the multiplicand (tnum b) into a mask-only tnum. The following paper
explains the algorithm in more detail: https://arxiv.org/abs/2105.05398.
A natural way to construct the tnum product is by performing a tnum
addition on all the partial products. This algorithm presents another
method of doing this: decompose each partial product into two tnums,
consisting of the values and the masks separately. The mask-sum is
accumulated within the loop in acc_m. The value-sum tnum is generated
using a.value * b.value. The tnum constructed by tnum addition of the
value-sum and the mask-sum contains all possible summations of concrete
values drawn from the partial product tnums pairwise. We prove this result
in the paper.
Our evaluations show that the new algorithm is overall more precise
(producing tnums with less uncertain components) than the existing method.
As an illustrative example, consider the input tnums A and B. The numbers
in the parenthesis correspond to (value;mask).
A = 000000x1 (1;2)
B = 0010011x (38;1)
A * B (existing) = xxxxxxxx (0;255)
A * B (new) = 0x1xxxxx (32;95)
Importantly, we present a proof of soundness of the new algorithm in the
aforementioned paper. Additionally, we show that this new algorithm is
empirically faster than the existing method.
Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu>
Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu>
Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://arxiv.org/abs/2105.05398
Link: https://lore.kernel.org/bpf/20210531020157.7386-1-harishankar.vishwanathan@rutgers.edu
Hangbin Liu [Fri, 28 May 2021 02:43:56 +0000 (22:43 -0400)]
bpf, devmap: Remove drops variable from bq_xmit_all()
As Colin pointed out, the first drops assignment after declaration will
be overwritten by the second drops assignment before using, which makes
it useless.
Since the drops variable will be used only once. Just remove it and
use "cnt - sent" in trace_xdp_devmap_xmit().
Fixes:
cb261b594b41 ("bpf: Run devmap xdp_prog on flush instead of bulk enqueue")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210528024356.24333-1-liuhangbin@gmail.com
Yonghong Song [Wed, 26 May 2021 15:24:57 +0000 (08:24 -0700)]
bpf, docs: Add llvm_reloc.rst to explain llvm bpf relocations
LLVM upstream commit https://reviews.llvm.org/D102712 made some changes
to bpf relocations to make them llvm linker lld friendly. The scope of
existing relocations R_BPF_64_{64,32} is narrowed and new relocations
R_BPF_64_{ABS32,ABS64,NODYLD32} are introduced.
Let us add some documentation about llvm bpf relocations so people can
understand how to resolve them properly in their respective tools.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210526152457.335210-1-yhs@fb.com
Florent Revest [Wed, 26 May 2021 16:46:43 +0000 (18:46 +0200)]
libbpf: Move BPF_SEQ_PRINTF and BPF_SNPRINTF to bpf_helpers.h
These macros are convenient wrappers around the bpf_seq_printf and
bpf_snprintf helpers. They are currently provided by bpf_tracing.h which
targets low level tracing primitives. bpf_helpers.h is a better fit.
The __bpf_narg and __bpf_apply are needed in both files and provided
twice. __bpf_empty isn't used anywhere and is removed from bpf_tracing.h
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210526164643.2881368-1-revest@chromium.org
Daniel Borkmann [Wed, 26 May 2021 07:46:17 +0000 (09:46 +0200)]
Merge branch 'bpf-xdp-bcast'
Hangbin Liu says:
====================
This patchset is a new implementation for XDP multicast support based
on my previous 2 maps implementation[1]. The reason is that Daniel thinks
the exclude map implementation is missing proper bond support in XDP
context. And there is a plan to add native XDP bonding support. Adding
a exclude map in the helper also increases the complexity of verifier and
has drawbacks on performance.
The new implementation just add two new flags BPF_F_BROADCAST and
BPF_F_EXCLUDE_INGRESS to extend xdp_redirect_map for broadcast support.
With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.
The patchv11 link is here [2].
[1] https://lore.kernel.org/bpf/
20210223125809.1376577-1-liuhangbin@gmail.com
[2] https://lore.kernel.org/bpf/
20210513070447.1878448-1-liuhangbin@gmail.com
v12: As Daniel pointed out:
a) defined as const u64 for flag_mask and action_mask in
__bpf_xdp_redirect_map()
b) remove BPF_F_ACTION_MASK in uapi header
c) remove EXPORT_SYMBOL_GPL for xdpf_clone()
v11:
a) Use unlikely() when checking if this is for broadcast redirecting.
b) Fix a tracepoint NULL pointer issue Jesper found
c) Remove BPF_F_REDIR_MASK and just use OR flags to make the reader more
clear about what's flags we are using
d) Add the performace number with multi veth interfaces in patch 01
description.
e) remove some sleeps to reduce the testing time in patch04. Re-struct the
test and make clear what flags we are testing.
v10: use READ/WRITE_ONCE when read/write map instead of xchg()
v9: Update patch 01 commit description
v8: use hlist_for_each_entry_rcu() when looping the devmap hash ojbs
v7: No need to free xdpf in dev_map_enqueue_clone() if xdpf_clone failed.
v6: Fix a skb leak in the error path for generic XDP
v5: Just walk the map directly to get interfaces as get_next_key() of devmap
hash may restart looping from the first key if the device get removed.
After update the performace has improved 10% compired with v4.
v4: Fix flags never cleared issue in patch 02. Update selftest to cover this.
v3: Rebase the code based on latest bpf-next
v2: fix flag renaming issue in patch 02
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Hangbin Liu [Wed, 19 May 2021 09:07:47 +0000 (17:07 +0800)]
selftests/bpf: Add xdp_redirect_multi test
Add a bpf selftest for new helper xdp_redirect_map_multi(). In this
test there are 3 forward groups and 1 exclude group. The test will
redirect each interface's packets to all the interfaces in the forward
group, and exclude the interface in exclude map.
Two maps (DEVMAP, DEVMAP_HASH) and two xdp modes (generic, drive) will
be tested. XDP egress program will also be tested by setting pkt src MAC
to egress interface's MAC address.
For more test details, you can find it in the test script. Here is
the test result.
]# time ./test_xdp_redirect_multi.sh
Pass: xdpgeneric arp(F_BROADCAST) ns1-1
Pass: xdpgeneric arp(F_BROADCAST) ns1-2
Pass: xdpgeneric arp(F_BROADCAST) ns1-3
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-1
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-2
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-3
Pass: xdpgeneric IPv6 (no flags) ns1-1
Pass: xdpgeneric IPv6 (no flags) ns1-2
Pass: xdpdrv arp(F_BROADCAST) ns1-1
Pass: xdpdrv arp(F_BROADCAST) ns1-2
Pass: xdpdrv arp(F_BROADCAST) ns1-3
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-1
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-2
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-3
Pass: xdpdrv IPv6 (no flags) ns1-1
Pass: xdpdrv IPv6 (no flags) ns1-2
Pass: xdpegress mac ns1-2
Pass: xdpegress mac ns1-3
Summary: PASS 18, FAIL 0
real 1m18.321s
user 0m0.123s
sys 0m0.350s
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-5-liuhangbin@gmail.com
Hangbin Liu [Wed, 19 May 2021 09:07:46 +0000 (17:07 +0800)]
sample/bpf: Add xdp_redirect_map_multi for redirect_map broadcast test
This is a sample for xdp redirect broadcast. In the sample we could forward
all packets between given interfaces. There is also an option -X that could
enable 2nd xdp_prog on egress interface.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-4-liuhangbin@gmail.com
Hangbin Liu [Wed, 19 May 2021 09:07:45 +0000 (17:07 +0800)]
xdp: Extend xdp_redirect_map with broadcast support
This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
extend xdp_redirect_map for broadcast support.
With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.
When getting the devices in dev hash map via dev_map_hash_get_next_key(),
there is a possibility that we fall back to the first key when a device
was removed. This will duplicate packets on some interfaces. So just walk
the whole buckets to avoid this issue. For dev array map, we also walk the
whole map to find valid interfaces.
Function bpf_clear_redirect_map() was removed in
commit
ee75aef23afe ("bpf, xdp: Restructure redirect actions").
Add it back as we need to use ri->map again.
With test topology:
+-------------------+ +-------------------+
| Host A (i40e 10G) | ---------- | eno1(i40e 10G) |
+-------------------+ | |
| Host B |
+-------------------+ | |
| Host C (i40e 10G) | ---------- | eno2(i40e 10G) |
+-------------------+ | |
| +------+ |
| veth0 -- | Peer | |
| veth1 -- | | |
| veth2 -- | NS | |
| +------+ |
+-------------------+
On Host A:
# pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64
On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
All the veth peers in the NS have a XDP_DROP program loaded. The
forward_map max_entries in xdp_redirect_map_multi is modify to 4.
Testing the performance impact on the regular xdp_redirect path with and
without patch (to check impact of additional check for broadcast mode):
5.12 rc4 | redirect_map i40e->i40e | 2.0M | 9.7M
5.12 rc4 | redirect_map i40e->veth | 1.7M | 11.8M
5.12 rc4 + patch | redirect_map i40e->i40e | 2.0M | 9.6M
5.12 rc4 + patch | redirect_map i40e->veth | 1.7M | 11.7M
Testing the performance when cloning packets with the redirect_map_multi
test, using a redirect map size of 4, filled with 1-3 devices:
5.12 rc4 + patch | redirect_map multi i40e->veth (x1) | 1.7M | 11.4M
5.12 rc4 + patch | redirect_map multi i40e->veth (x2) | 1.1M | 4.3M
5.12 rc4 + patch | redirect_map multi i40e->veth (x3) | 0.8M | 2.6M
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com
Jesper Dangaard Brouer [Wed, 19 May 2021 09:07:44 +0000 (17:07 +0800)]
bpf: Run devmap xdp_prog on flush instead of bulk enqueue
This changes the devmap XDP program support to run the program when the
bulk queue is flushed instead of before the frame is enqueued. This has
a couple of benefits:
- It "sorts" the packets by destination devmap entry, and then runs the
same BPF program on all the packets in sequence. This ensures that we
keep the XDP program and destination device properties hot in I-cache.
- It makes the multicast implementation simpler because it can just
enqueue packets using bq_enqueue() without having to deal with the
devmap program at all.
The drawback is that if the devmap program drops the packet, the enqueue
step is redundant. However, arguably this is mostly visible in a
micro-benchmark, and with more mixed traffic the I-cache benefit should
win out. The performance impact of just this patch is as follows:
Using 2 10Gb i40e NIC, redirecting one to another, or into a veth interface,
which do XDP_DROP on veth peer. With xdp_redirect_map in sample/bpf, send
pkts via pktgen cmd:
./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64
There are about +/- 0.1M deviation for native testing, the performance
improved for the base-case, but some drop back with xdp devmap prog attached.
Version | Test | Generic | Native | Native + 2nd xdp_prog
5.12 rc4 | xdp_redirect_map i40e->i40e | 1.9M | 9.6M | 8.4M
5.12 rc4 | xdp_redirect_map i40e->veth | 1.7M | 11.7M | 9.8M
5.12 rc4 + patch | xdp_redirect_map i40e->i40e | 1.9M | 9.8M | 8.0M
5.12 rc4 + patch | xdp_redirect_map i40e->veth | 1.7M | 12.0M | 9.4M
When bq_xmit_all() is called from bq_enqueue(), another packet will
always be enqueued immediately after, so clearing dev_rx, xdp_prog and
flush_node in bq_xmit_all() is redundant. Move the clear to __dev_flush(),
and only check them once in bq_enqueue() since they are all modified
together.
This change also has the side effect of extending the lifetime of the
RCU-protected xdp_prog that lives inside the devmap entries: Instead of
just living for the duration of the XDP program invocation, the
reference now lives all the way until the bq is flushed. This is safe
because the bq flush happens at the end of the NAPI poll loop, so
everything happens between a local_bh_disable()/local_bh_enable() pair.
However, this is by no means obvious from looking at the call sites; in
particular, some drivers have an additional rcu_read_lock() around only
the XDP program invocation, which only confuses matters further.
Cleaning this up will be done in a separate patch series.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-2-liuhangbin@gmail.com
Alexei Starovoitov [Wed, 26 May 2021 00:32:35 +0000 (17:32 -0700)]
Merge branch 'libbpf: error reporting changes for v1.0'
Andrii Nakryiko says:
====================
Implement error reporting changes discussed in "Libbpf: the road to v1.0"
([0]) document.
Libbpf gets a new API, libbpf_set_strict_mode() which accepts a set of flags
that turn on a set of libbpf 1.0 changes, that might be potentially breaking.
It's possible to opt-in into all current and future 1.0 features by specifying
LIBBPF_STRICT_ALL flag.
When some of the 1.0 "features" are requested, libbpf APIs might behave
differently. In this patch set a first set of changes are implemented, all
related to the way libbpf returns errors. See individual patches for details.
Patch #1 adds a no-op libbpf_set_strict_mode() functionality to enable
updating selftests.
Patch #2 gets rid of all the bad code patterns that will break in libbpf 1.0
(exact -1 comparison for low-level APIs, direct IS_ERR() macro usage to check
pointer-returning APIs for error, etc). These changes make selftest work in
both legacy and 1.0 libbpf modes. Selftests also opt-in into 100% libbpf 1.0
mode to automatically gain all the subsequent changes, which will come in
follow up patches.
Patch #3 streamlines error reporting for low-level APIs wrapping bpf() syscall.
Patch #4 streamlines errors for all the rest APIs.
Patch #5 ensures that BPF skeletons propagate errors properly as well, as
currently on error some APIs will return NULL with no way of checking exact
error code.
[0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY
v1->v2:
- move libbpf_set_strict_mode() implementation to patch #1, where it belongs
(Alexei);
- add acks, slight rewording of commit messages.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Tue, 25 May 2021 03:59:35 +0000 (20:59 -0700)]
bpftool: Set errno on skeleton failures and propagate errors
Follow libbpf's error handling conventions and pass through errors and errno
properly. Skeleton code always returned NULL on errors (not ERR_PTR(err)), so
there are no backwards compatibility concerns. But now we also set errno
properly, so it's possible to distinguish different reasons for failure, if
necessary.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-6-andrii@kernel.org
Andrii Nakryiko [Tue, 25 May 2021 03:59:34 +0000 (20:59 -0700)]
libbpf: Streamline error reporting for high-level APIs
Implement changes to error reporting for high-level libbpf APIs to make them
less surprising and less error-prone to users:
- in all the cases when error happens, errno is set to an appropriate error
value;
- in libbpf 1.0 mode, all pointer-returning APIs return NULL on error and
error code is communicated through errno; this applies both to APIs that
already returned NULL before (so now they communicate more detailed error
codes), as well as for many APIs that used ERR_PTR() macro and encoded
error numbers as fake pointers.
- in legacy (default) mode, those APIs that were returning ERR_PTR(err),
continue doing so, but still set errno.
With these changes, errno can be always used to extract actual error,
regardless of legacy or libbpf 1.0 modes. This is utilized internally in
libbpf in places where libbpf uses it's own high-level APIs.
libbpf_get_error() is adapted to handle both cases completely transparently to
end-users (and is used by libbpf consistently as well).
More context, justification, and discussion can be found in "Libbpf: the road
to v1.0" document ([0]).
[0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-5-andrii@kernel.org
Andrii Nakryiko [Tue, 25 May 2021 03:59:33 +0000 (20:59 -0700)]
libbpf: Streamline error reporting for low-level APIs
Ensure that low-level APIs behave uniformly across the libbpf as follows:
- in case of an error, errno is always set to the correct error code;
- when libbpf 1.0 mode is enabled with LIBBPF_STRICT_DIRECT_ERRS option to
libbpf_set_strict_mode(), return -Exxx error value directly, instead of -1;
- by default, until libbpf 1.0 is released, keep returning -1 directly.
More context, justification, and discussion can be found in "Libbpf: the road
to v1.0" document ([0]).
[0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-4-andrii@kernel.org
Andrii Nakryiko [Tue, 25 May 2021 03:59:32 +0000 (20:59 -0700)]
selftests/bpf: Turn on libbpf 1.0 mode and fix all IS_ERR checks
Turn ony libbpf 1.0 mode. Fix all the explicit IS_ERR checks that now will be
broken because libbpf returns NULL on error (and sets errno). Fix
ASSERT_OK_PTR and ASSERT_ERR_PTR to work for both old mode and new modes and
use them throughout selftests. This is trivial to do by using
libbpf_get_error() API that all libbpf users are supposed to use, instead of
IS_ERR checks.
A bunch of checks also did explicit -1 comparison for various fd-returning
APIs. Such checks are replaced with >= 0 or < 0 cases.
There were also few misuses of bpf_object__find_map_by_name() in test_maps.
Those are fixed in this patch as well.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-3-andrii@kernel.org
Andrii Nakryiko [Tue, 25 May 2021 03:59:31 +0000 (20:59 -0700)]
libbpf: Add libbpf_set_strict_mode() API to turn on libbpf 1.0 behaviors
Add libbpf_set_strict_mode() API that allows application to simulate libbpf
1.0 breaking changes before libbpf 1.0 is released. This will help users
migrate gradually and with confidence.
For now only ALL or NONE options are available, subsequent patches will add
more flags. This patch is preliminary for selftests/bpf changes.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-2-andrii@kernel.org
Magnus Karlsson [Fri, 21 May 2021 08:33:01 +0000 (10:33 +0200)]
xsk: Use kvcalloc to support large umems
Use kvcalloc() instead of kcalloc() to support large umems with, on my
server, one million pages or more in the umem.
Reported-by: Dan Siemon <dan@coverfire.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210521083301.26921-1-magnus.karlsson@gmail.com
Zhen Lei [Tue, 25 May 2021 02:56:59 +0000 (10:56 +0800)]
bpf: Fix spelling mistakes
Fix some spelling mistakes in comments:
aother ==> another
Netiher ==> Neither
desribe ==> describe
intializing ==> initializing
funciton ==> function
wont ==> won't and move the word 'the' at the end to the next line
accross ==> across
pathes ==> paths
triggerred ==> triggered
excute ==> execute
ether ==> either
conervative ==> conservative
convetion ==> convention
markes ==> marks
interpeter ==> interpreter
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210525025659.8898-2-thunder.leizhen@huawei.com
Aditya Srivastava [Sun, 23 May 2021 15:14:08 +0000 (20:44 +0530)]
samples: bpf: Ix kernel-doc syntax in file header
The opening comment mark '/**' is used for highlighting the beginning of
kernel-doc comments.
The header for samples/bpf/ibumad_kern.c follows this syntax, but
the content inside does not comply with kernel-doc.
This line was probably not meant for kernel-doc parsing, but is parsed
due to the presence of kernel-doc like comment syntax(i.e, '/**'), which
causes unexpected warnings from kernel-doc:
warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* ibumad BPF sample kernel side
Provide a simple fix by replacing this occurrence with general comment
format, i.e. '/*', to prevent kernel-doc from parsing it.
Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/bpf/20210523151408.22280-1-yashsri421@gmail.com
Yonghong Song [Sat, 22 May 2021 16:23:41 +0000 (09:23 -0700)]
libbpf: Add support for new llvm bpf relocations
LLVM patch https://reviews.llvm.org/D102712
narrowed the scope of existing R_BPF_64_64
and R_BPF_64_32 relocations, and added three
new relocations, R_BPF_64_ABS64, R_BPF_64_ABS32
and R_BPF_64_NODYLD32. The main motivation is
to make relocations linker friendly.
This change, unfortunately, breaks libbpf build,
and we will see errors like below:
libbpf: ELF relo #0 in section #6 has unexpected type 2 in
/home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o
Error: failed to link
'/home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o':
Unknown error -22 (-22)
The new relocation R_BPF_64_ABS64 is generated
and libbpf linker sanity check doesn't understand it.
Relocation section '.rel.struct_ops' at offset 0x1410 contains 1 entries:
Offset Info Type Symbol's Value Symbol's Name
0000000000000018 0000000700000002 R_BPF_64_ABS64
0000000000000000 nogpltcp_init
Look at the selftests/bpf/bpf_tcp_nogpl.c,
void BPF_STRUCT_OPS(nogpltcp_init, struct sock *sk)
{
}
SEC(".struct_ops")
struct tcp_congestion_ops bpf_nogpltcp = {
.init = (void *)nogpltcp_init,
.name = "bpf_nogpltcp",
};
The new llvm relocation scheme categorizes 'nogpltcp_init' reference
as R_BPF_64_ABS64 instead of R_BPF_64_64 which is used to specify
ld_imm64 relocation in the new scheme.
Let us fix the linker sanity checking by including
R_BPF_64_ABS64 and R_BPF_64_ABS32. There is no need to
check R_BPF_64_NODYLD32 which is used for .BTF and .BTF.ext.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210522162341.3687617-1-yhs@fb.com
Andrii Nakryiko [Mon, 24 May 2021 20:30:53 +0000 (13:30 -0700)]
Merge branch 'Add lookup_and_delete_elem support to BPF hash map types'
Denis Salopek says:
====================
This patch series extends the existing bpf_map_lookup_and_delete_elem()
functionality with 4 more map types:
- BPF_MAP_TYPE_HASH,
- BPF_MAP_TYPE_PERCPU_HASH,
- BPF_MAP_TYPE_LRU_HASH and
- BPF_MAP_TYPE_LRU_PERCPU_HASH.
Patch 1 adds most of its functionality and logic as well as
documentation.
As it was previously limited to only stacks and queues which do not
support the BPF_F_LOCK flag, patch 2 enables its usage by adding a new
libbpf API bpf_map_lookup_and_delete_elem_flags() based on the existing
bpf_map_lookup_elem_flags().
Patch 3 adds selftests for lookup_and_delete_elem().
Changes in patch 1:
v7: Minor formating nits, add Acked-by.
v6: Remove unneeded flag check, minor code/format fixes.
v5: Split patch to 3 patches. Extend BPF_MAP_LOOKUP_AND_DELETE_ELEM
documentation with this changes.
v4: Fix the return value for unsupported map types.
v3: Add bpf_map_lookup_and_delete_elem_flags() and enable BPF_F_LOCK
flag, change CHECKs to ASSERT_OKs, initialize variables to 0.
v2: Add functionality for LRU/per-CPU, add test_progs tests.
Changes in patch 2:
v7: No change.
v6: Add Acked-by.
v5: Move to the newest libbpf version (0.4.0).
Changes in patch 3:
v7: Remove ASSERT_GE macro which is already added in some other commit,
change ASSERT_OK to ASSERT_OK_PTR, add Acked-by.
v6: Remove PERCPU macros, add ASSERT_GE macro to test_progs.h, remove
leftover code.
v5: Use more appropriate macros. Better check for changed value.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Denis Salopek [Tue, 11 May 2021 21:00:06 +0000 (23:00 +0200)]
selftests/bpf: Add bpf_lookup_and_delete_elem tests
Add bpf selftests and extend existing ones for a new function
bpf_lookup_and_delete_elem() for (percpu) hash and (percpu) LRU hash map
types.
In test_lru_map and test_maps we add an element, lookup_and_delete it,
then check whether it's deleted.
The newly added lookup_and_delete prog tests practically do the same
thing but additionally use a BPF program to change the value of the
element for LRU maps.
Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/d30d3e0060c1f750e133579623cf1c60ff58f3d9.1620763117.git.denis.salopek@sartura.hr
Denis Salopek [Tue, 11 May 2021 21:00:05 +0000 (23:00 +0200)]
bpf: Extend libbpf with bpf_map_lookup_and_delete_elem_flags
Add bpf_map_lookup_and_delete_elem_flags() libbpf API in order to use
the BPF_F_LOCK flag with the map_lookup_and_delete_elem() function.
Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/15b05dafe46c7e0750d110f233977372029d1f62.1620763117.git.denis.salopek@sartura.hr
Denis Salopek [Tue, 11 May 2021 21:00:04 +0000 (23:00 +0200)]
bpf: Add lookup_and_delete_elem support to hashtab
Extend the existing bpf_map_lookup_and_delete_elem() functionality to
hashtab map types, in addition to stacks and queues.
Create a new hashtab bpf_map_ops function that does lookup and deletion
of the element under the same bucket lock and add the created map_ops to
bpf.h.
Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/4d18480a3e990ffbf14751ddef0325eed3be2966.1620763117.git.denis.salopek@sartura.hr
Stanislav Fomichev [Fri, 21 May 2021 03:06:53 +0000 (20:06 -0700)]
libbpf: Skip bpf_object__probe_loading for light skeleton
I'm getting the following error when running 'gen skeleton -L' as
regular user:
libbpf: Error in bpf_object__probe_loading():Operation not permitted(1).
Couldn't load trivial BPF program. Make sure your kernel supports BPF
(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
value.
Fixes:
67234743736a ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210521030653.2626513-1-sdf@google.com
YueHaibing [Mon, 24 May 2021 01:07:01 +0000 (09:07 +0800)]
ethernet: ucc_geth: Use kmemdup() rather than kmalloc+memcpy
Issue identified with Coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sun, 23 May 2021 15:58:42 +0000 (08:58 -0700)]
net: r6040: Allow restarting auto-negotiation
Use phy_ethtool_nway_reset() since the driver makes use of the PHY
library.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 May 2021 00:20:53 +0000 (17:20 -0700)]
Merge branch 'r6040-cleanups'
Florian Fainelli says:
====================
net: r6040: Non-functional changes
These two patches clean up the r6040 driver a little bit in preparation
for adding additional features such as dumping MAC counters and properly
dealing with DMA-API mapping.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sun, 23 May 2021 15:54:11 +0000 (08:54 -0700)]
net: r6040: Use ETH_FCS_LEN
Instead of the open coded constant 4.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sun, 23 May 2021 15:54:10 +0000 (08:54 -0700)]
net: r6040: Use logical or for MDIO operations
This is not a functional change, but we should be using a logical or to
assign the bits we will be writing to the MDIO read and write registers.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Sun, 23 May 2021 06:02:23 +0000 (14:02 +0800)]
ehea: Use DEVICE_ATTR_*() macro
Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Sun, 23 May 2021 03:24:09 +0000 (11:24 +0800)]
sfc: falcon: use DEVICE_ATTR_*() macro
Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Sun, 23 May 2021 03:20:30 +0000 (11:20 +0800)]
sfc: use DEVICE_ATTR_*() macro
Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Sat, 22 May 2021 12:02:46 +0000 (20:02 +0800)]
net: ftgmac100: add missing error return code in ftgmac100_probe()
The variables will be free on path err_phy_connect, it should
return error code, or it will cause double free when calling
ftgmac100_remove().
Fixes:
bd466c3fb5a4 ("net/faraday: Support NCSI mode")
Fixes:
39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 21 May 2021 14:12:20 +0000 (17:12 +0300)]
dpaa2-eth: don't print error from dpaa2_mac_connect if that's EPROBE_DEFER
When booting a board with DPAA2 interfaces defined statically via DPL
(as opposed to creating them dynamically using restool), the driver will
print an unspecific error message.
This change adds the error code to the message, and avoids printing
altogether if the error code is EPROBE_DEFER, because that is not a
cause of alarm.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 May 2021 21:05:04 +0000 (14:05 -0700)]
Merge branch 'dpaa2-eth-of_node'
Ioana Ciornei says:
====================
dpaa2-eth: setup the of_node
This patch set allows DSA to work with a DPAA2 master device by setting
up the of_node to point to the corresponding MAC DTS node, so that
of_find_net_device_by_node() can find it.
As an added benefit, udev rules can now be used to create a naming
scheme based on the physical MAC.
The second patch renames the debugfs directory to use the DPNI name
instead of the netdev name, since the latter can be changed after probe
time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Fri, 21 May 2021 13:25:30 +0000 (16:25 +0300)]
dpaa2-eth: name the debugfs directory after the DPNI object
Name the debugfs directory after the DPNI object instead of the netdev
name since this can be changed after probe by udev rules.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Fri, 21 May 2021 13:25:29 +0000 (16:25 +0300)]
dpaa2-eth: setup the of_node field of the device
When the DPNI object is connected to a DPMAC, setup the of_node to point
to the DTS device node of that specific MAC. This enables other drivers,
for example the DSA subsystem, to find the net_device by its device
node.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 May 2021 21:01:41 +0000 (14:01 -0700)]
Merge branch 'sja1105-stats'
Vladimir Oltean says:
====================
Ethtool statistics counters cleanup for SJA1105 DSA driver
This series removes some reported data from ethtool -S which were not
counters at all, and reorganizes the code such that counters can be read
individually and not just all at once.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 21 May 2021 13:16:08 +0000 (16:16 +0300)]
net: dsa: sja1105: don't use burst SPI reads for port statistics
The current internal sja1105 driver API is optimized for retrieving many
statistics counters at once. But the switch does not do atomic snapshotting
for them anyway.
In case we start reporting the hardware port counters through
ndo_get_stats64 as well, not just ethtool, it would be good to be able
to read individual port counters and not all of them.
Additionally, since Arnd Bergmann's commit
ae1804de93f6 ("dsa: sja1105:
dynamically allocate stats structure"), sja1105_get_ethtool_stats
allocates memory dynamically, since struct sja1105_port_status was
deemed to consume too much stack memory. That is not ideal.
The large structure is only needed because of the burst read.
If we read statistics one by one, we can consume less memory, and
we can avoid dynamic allocation.
Additionally, latency-sensitive interfaces such as PTP operations (for
phc2sys) might suffer if the SPI mutex is being held for too long, which
happens in the case of SPI burst reads. By reading counters one by one,
we give a chance for higher priority processes to preempt and take the
SPI bus mutex for accessing the PTP clock.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 21 May 2021 13:16:07 +0000 (16:16 +0300)]
net: dsa: sja1105: stop reporting the queue levels in ethtool port counters
The queue levels are not counters, but instead they represent the
occupancy of the MAC TX queues. Having these in ethtool port counters is
not helpful, so remove them.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 21 May 2021 10:01:46 +0000 (11:01 +0100)]
net: hns3: Fix return of uninitialized variable ret
In the unlikely event that rule_cnt is zero the variable ret is
not assigned a value and function hclge_dbg_dump_fd_tcam can end
up returning an unitialized value in ret. Fix this by explicitly
setting ret to zero before the for-loop.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes:
b5a0b70d77b9 ("net: hns3: refactor dump fd tcam of debugfs")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
zuoqilin [Fri, 21 May 2021 09:45:22 +0000 (17:45 +0800)]
atm: Fix typo
Change 'contol' to 'control'.
Signed-off-by: zuoqilin <zuoqilin@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiapeng Chong [Fri, 21 May 2021 09:40:14 +0000 (17:40 +0800)]
net: phy: Fix inconsistent indenting
Eliminate the follow smatch warning:
drivers/net/phy/phy_device.c:2886 phy_probe() warn: inconsistent
indenting.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Fri, 21 May 2021 03:57:21 +0000 (11:57 +0800)]
sfc: farch: fix compile warning in efx_farch_dimension_resources()
Fix the following kernel build warning when CONFIG_SFC_SRIOV is disabled:
drivers/net/ethernet/sfc/farch.c: In function ‘efx_farch_dimension_resources’:
drivers/net/ethernet/sfc/farch.c:1671:21: warning: variable ‘buftbl_min’ set but not used [-Wunused-but-set-variable]
unsigned vi_count, buftbl_min, total_tx_channels;
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Hai [Fri, 21 May 2021 03:31:35 +0000 (11:31 +0800)]
net: bonding: bond_alb: Fix some typos in bond_alb.c
s/becase/because/
s/reqeusts/requests/
s/funcions/functions/
s/addreses/addresses/
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Hai [Fri, 21 May 2021 03:24:55 +0000 (11:24 +0800)]
caif_virtio: Fix some typos in caif_virtio.c
s/patckets/packets/
s/avilable/available/
s/tbe/the/
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 May 2021 20:26:41 +0000 (13:26 -0700)]
Merge branch 'wan-cleanups'
Guangbin Huang says:
====================
net: wan: clean up some code style issues
This patchset clean up some code style issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 21 May 2021 01:08:17 +0000 (09:08 +0800)]
net: wan: add necessary () to macro argument
Macro argument 'card' and 'port' may be better as
'(card)' and '(port)' to avoid precedence issues.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 21 May 2021 01:08:16 +0000 (09:08 +0800)]
net: wan: add braces {} to all arms of the statement
Braces {} should be used on all arms of this statement.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 21 May 2021 01:08:15 +0000 (09:08 +0800)]
net: wan: remove redundant blank lines
This patch removes some redundant blank lines.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 21 May 2021 01:08:14 +0000 (09:08 +0800)]
net: wan: fix the code style issue about trailing statements
Trailing statements should be on next line.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 21 May 2021 01:08:13 +0000 (09:08 +0800)]
net: wan: add some required spaces
Add space required after that close brace '}'.
Add space required before the open parenthesis '('.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 21 May 2021 01:08:12 +0000 (09:08 +0800)]
net: wan: fix an code style issue about "foo* bar"
Fix the checkpatch error as "foo* bar" should be "foo *bar".
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 May 2021 20:23:29 +0000 (13:23 -0700)]
Merge branch 'sja1105-spi'
Vladimir Oltean says:
====================
Adapt the sja1105 DSA driver to the SPI controller's transfer limits
This series changes the SPI transfer procedure in sja1105 to take into
consideration the buffer size limitations that the SPI controller driver
might have.
Changes in v2:
Remove the driver's use of cs_change and send multiple, smaller SPI
messages instead of a single large one.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 20 May 2021 21:16:57 +0000 (00:16 +0300)]
net: dsa: sja1105: adapt to a SPI controller with a limited max transfer size
The static config of the sja1105 switch is a long stream of bytes which
is programmed to the hardware in chunks (portions with the chip select
continuously asserted) of max 256 bytes each. Each chunk is a
spi_message composed of 2 spi_transfers: the buffer with the data and a
preceding buffer with the SPI access header.
Only that certain SPI controllers, such as the spi-sc18is602 I2C-to-SPI
bridge, cannot keep the chip select asserted for that long.
The spi_max_transfer_size() and spi_max_message_size() functions are how
the controller can impose its hardware limitations upon the SPI
peripheral driver.
For the sja1105 driver to work with these controllers, both buffers must
be smaller than the transfer limit, and their sum must be smaller than
the message limit.
Regression-tested on a switch connected to a controller with no
limitations (spi-fsl-dspi) as well as with one with caps for both
max_transfer_size and max_message_size (spi-sc18is602).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 20 May 2021 21:16:56 +0000 (00:16 +0300)]
net: dsa: sja1105: send multiple spi_messages instead of using cs_change
The sja1105 driver has been described by Mark Brown as "not using the
[ SPI ] API at all idiomatically" due to the use of cs_change:
https://patchwork.kernel.org/project/netdevbpf/patch/
20210520135031.2969183-1-olteanv@gmail.com/
According to include/linux/spi/spi.h, the chip select is supposed to be
asserted for the entire length of a SPI message, as long as cs_change is
false for all member transfers. The cs_change flag changes the following:
(i) When a non-final SPI transfer has cs_change = true, the chip select
should temporarily deassert and then reassert starting with the next
transfer.
(ii) When a final SPI transfer has cs_change = true, the chip select
should remain asserted until the following SPI message.
The sja1105 driver only uses cs_change for its first property, to form a
single SPI message whose layout can be seen below:
this is an entire, single spi_message
_______________________________________________________________________________________________
/ \
+-------------+---------------+-------------+---------------+ ... +-------------+---------------+
| hdr_xfer[0] | chunk_xfer[0] | hdr_xfer[1] | chunk_xfer[1] | | hdr_xfer[n] | chunk_xfer[n] |
+-------------+---------------+-------------+---------------+ ... +-------------+---------------+
cs_change false true false true false false
____________________________ _____________________________ _____________________________
CS line __/ \/ \ ... / \__
The fact of the matter is that spi_max_message_size() has an ambiguous
meaning if any non-final transfer has cs_change = true.
If the SPI master has a limitation in that it cannot keep the chip
select asserted for more than, say, 200 bytes (like the spi-sc18is602),
the normal thing for it to do is to implement .max_transfer_size and
.max_message_size, and limit both to 200: in the "worst case" where
cs_change is always false, then the controller can, indeed, not send
messages larger than 200 bytes.
But the fact that the SPI controller's max_message_size does not
necessarily mean that we cannot send messages larger than that.
Notably, if the SPI master special-cases the transfers with cs_change
and treats every chip select toggling as an entirely new transaction,
then a SPI message can easily exceed that limit. So there is a
temptation to ignore the controller's reported max_message_size when
using cs_change = true in non-final transfers.
But that can lead to false conclusions. As Mark points out, the SPI
controller might have a different kind of limitation with the max
message size, that has nothing at all to do with how long it can keep
the chip select asserted.
For example, that might be the case if the device is able to offload the
chip select changes to the hardware as part of the data stream, and it
packs the entire stream of commands+data (corresponding to a SPI
message) into a single DMA transfer that is itself limited in size.
So the only thing we can do is avoid ambiguity by not using cs_change at
all. Instead of sending a single spi_message, we now send multiple SPI
messages as follows:
spi_message 0 spi_message 1 spi_message n
____________________________ ___________________________ _____________________________
/ \ / \ / \
+-------------+---------------+-------------+---------------+ ... +-------------+---------------+
| hdr_xfer[0] | chunk_xfer[0] | hdr_xfer[1] | chunk_xfer[1] | | hdr_xfer[n] | chunk_xfer[n] |
+-------------+---------------+-------------+---------------+ ... +-------------+---------------+
cs_change false true false true false false
____________________________ _____________________________ _____________________________
CS line __/ \/ \ ... / \__
which is clearer because the max_message_size limit is now easier to
enforce. What is transmitted on the wire stays, of course, the same.
Additionally, because we send no more than 2 transfers at a time, we now
avoid dynamic memory allocation too, which might be seen as an
improvement by some.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Geis [Thu, 20 May 2021 16:32:30 +0000 (12:32 -0400)]
net: phy: add driver for Motorcomm yt8511 phy
Add a driver for the Motorcomm yt8511 phy that will be used in the
production Pine64 rk3566-quartz64 development board.
It supports gigabit transfer speeds, rgmii, and 125mhz clk output.
Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Wed, 19 May 2021 02:47:04 +0000 (10:47 +0800)]
net: xilinx_emaclite: Do not print real IOMEM pointer
Printing kernel pointers is discouraged because they might leak kernel
memory layout. This fixes smatch warning:
drivers/net/ethernet/xilinx/xilinx_emaclite.c:1191 xemaclite_of_probe() warn:
argument 4 to %08lX specifier is cast from pointer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Thu, 20 May 2021 13:46:19 +0000 (21:46 +0800)]
net: cdc_ncm: use DEVICE_ATTR_RW macro
Use DEVICE_ATTR_RW helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Thu, 20 May 2021 13:41:16 +0000 (21:41 +0800)]
net: usb: hso: use DEVICE_ATTR_RO macro
Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Thu, 20 May 2021 13:36:45 +0000 (21:36 +0800)]
net: atm: use DEVICE_ATTR_RO macro
Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Po-Hsu Lin [Thu, 20 May 2021 10:49:54 +0000 (18:49 +0800)]
selftests: net: devlink_port_split.py: skip the test if no devlink device
When there is no devlink device, the following command will return:
$ devlink -j dev show
{dev:{}}
This will cause IndexError when trying to access the first element
in dev of this json dataset. Use the kselftest framework skip code
to skip this test in this case.
Example output with this change:
# selftests: net: devlink_port_split.py
# no devlink device was found, test skipped
ok 7 selftests: net: devlink_port_split.py # SKIP
Link: https://bugs.launchpad.net/bugs/1928889
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Suchanek [Thu, 20 May 2021 06:50:34 +0000 (08:50 +0200)]
ibmvnic: remove default label from to_string switch
This way the compiler warns when a new value is added to the enum but
not to the string translation like:
drivers/net/ethernet/ibm/ibmvnic.c: In function 'adapter_state_to_string':
drivers/net/ethernet/ibm/ibmvnic.c:832:2: warning: enumeration value 'VNIC_FOOBAR' not handled in switch [-Wswitch]
switch (state) {
^~~~~~
drivers/net/ethernet/ibm/ibmvnic.c: In function 'reset_reason_to_string':
drivers/net/ethernet/ibm/ibmvnic.c:1935:2: warning: enumeration value 'VNIC_RESET_FOOBAR' not handled in switch [-Wswitch]
switch (reason) {
^~~~~~
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Acked-by: Lijun Pan <lijunp213@gmail.com>
Link: https://lore.kernel.org/netdev/CAOhMmr701LecfuNM+EozqbiTxFvDiXjFdY2aYeKJYaXq9kqVDg@mail.gmail.com/
Signed-off-by: David S. Miller <davem@davemloft.net>
wengjianfeng [Thu, 20 May 2021 01:05:50 +0000 (09:05 +0800)]
NFC: st21nfca: remove unnecessary variable and labels
assign vlue (EIO/EPROTO) to variable r, and goto exit label,
but just return r follow exit label, so we delete exit label,
and just replace with return sentence.
Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 May 2021 22:43:25 +0000 (15:43 -0700)]
Merge branch 'bond-cleanups'
Guangbin Huang says:
====================
net: bonding: clean up some code style issues
This patchset cleans up some code style issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Thu, 20 May 2021 06:18:35 +0000 (14:18 +0800)]
net: bonding: use tabs instead of space for code indent
Code indent should use tabs where possible, so
use tabs instead of space for code indent.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Thu, 20 May 2021 06:18:34 +0000 (14:18 +0800)]
net: bonding: remove unnecessary braces
Braces {} are not necessary for single statement blocks,
so remove these braces {}.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Thu, 20 May 2021 06:18:33 +0000 (14:18 +0800)]
net: bonding: fix code indent for conditional statements
Fix incorrect code indent for conditional statements.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Thu, 20 May 2021 06:18:32 +0000 (14:18 +0800)]
net: bonding: add some required blank lines
Add some blank lines after declarations as required.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 May 2021 22:13:28 +0000 (15:13 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2021-05-20
This series contains updates to igc driver only.
Andre Guedes says:
This series adds AF_XDP zero-copy feature to igc driver.
The initial patches do some code refactoring, preparing the code base to
land the AF_XDP zero-copy feature, avoiding code duplications. The last
patches of the series are the ones implementing the feature.
The last patch which indeed implements AF_XDP zero-copy support was
originally way too lengthy so, for the sake of code review, I broke it
up into two patches: one adding support for the RX functionality and the
other one adding TX support.
---
v2:
Patch 8/9 - "igc: Enable RX via AF_XDP zero-copy"
* In XDP_PASS flow, copy metadata too into the skb.
* When HW timestamp is added by the NIC, after copying it into
a local variable, update xdp_buff->data_meta so that
metadata length when XDP program is called 0.
* In igc_xdp_enable_pool(), call xsk_pool_dma_unmap() on
failure.
Known issues:
When an XDP application is running in Tx-Only mode with Zero-Copy
enabled, it is not expected to add the frames to the fill-queue. I have
noticed the following two issues in this scenario:
- If XDP_USE_NEED_WAKEUP flag is not set by application, igc_poll()
will go into infinite loop because the buffer allocation resulting
in igc_clean_rx_irq_zc() indicating that all work is not done and NAPI
should keep polling. This does not occur if XDP_USE_NEED_WAKEUP
flag is set.
- Since there are no buffers allocated by userspace for the fill
queue, there is no memory allocated for the NIC to copy the data
to. If the packet received is destined to the hardware queue where
XDP application is running, no packets are received even on other
queues.
Both these issues can be mitigated by adding a few frames to the
fill queue. The second issue can also be mitigated by making sure no
packets are being received on the hardware queue where Rx is running.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 May 2021 22:10:57 +0000 (15:10 -0700)]
Merge branch 'net-leading-spaces'
Hui Tang says:
====================
net: remove leading spaces before tabs
There are a few leading spaces before tabs and remove it by running the
following commard:
$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hui Tang [Thu, 20 May 2021 03:47:54 +0000 (11:47 +0800)]
mii: remove leading spaces before tabs
There are a few leading spaces before tabs and remove it by running
the following commard:
$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hui Tang [Thu, 20 May 2021 03:47:53 +0000 (11:47 +0800)]
ifb: remove leading spaces before tabs
There are a few leading spaces before tabs and remove it by running
the following commard:
$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hui Tang [Thu, 20 May 2021 03:47:52 +0000 (11:47 +0800)]
net: appletalk: remove leading spaces before tabs
There are a few leading spaces before tabs and remove it by running
the following commard:
$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hui Tang [Thu, 20 May 2021 03:47:51 +0000 (11:47 +0800)]
net: fddi: skfp: remove leading spaces before tabs
There are a few leading spaces before tabs and remove it by running
the following commard:
$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hui Tang [Thu, 20 May 2021 03:47:50 +0000 (11:47 +0800)]
net: hamradio: remove leading spaces before tabs
There are a few leading spaces before tabs and remove it by running
the following commard:
$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hui Tang [Thu, 20 May 2021 03:47:49 +0000 (11:47 +0800)]
net: ppp: remove leading spaces before tabs
There are a few leading spaces before tabs and remove it by running
the following commard:
$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hui Tang [Thu, 20 May 2021 03:47:48 +0000 (11:47 +0800)]
net: slip: remove leading spaces before tabs
There are a few leading spaces before tabs and remove it by running
the following commard:
$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hui Tang [Thu, 20 May 2021 03:47:47 +0000 (11:47 +0800)]
net: usb: remove leading spaces before tabs
There are a few leading spaces before tabs and remove it by running
the following commard:
$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hui Tang [Thu, 20 May 2021 03:47:46 +0000 (11:47 +0800)]
net: wan: remove leading spaces before tabs
There are a few leading spaces before tabs and remove it by running
the following commard:
$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: Xie He <xie.he.0141@gmail.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 May 2021 22:01:04 +0000 (15:01 -0700)]
Merge branch 'hns3-debugfs'
Huazhong Tan says:
====================
net: hns3: refactor some debugfs commands
This series refactors the debugfs command to the new
process and removes the useless debugfs file node cmd
for the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Thu, 20 May 2021 02:21:44 +0000 (10:21 +0800)]
net: hns3: remove the useless debugfs file node cmd
Currently, all debugfs commands have been reconstructed, and the
debugfs file node cmd is useless. So remove this debugfs file node.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Thu, 20 May 2021 02:21:43 +0000 (10:21 +0800)]
net: hns3: refactor dump serv info of debugfs
Currently, the debugfs command for serv info is implemented by
"echo xxxx > cmd", and record the inforamtion in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"serv_info" for it, and query it by command "cat serv_info",
return the result to userspace, rather than record in dmesg.
The display style is below:
$ cat service_task_info
local_clock: [ 114.203321]
delta: 784(ms)
last_service_task_processed:
4294918512(jiffies)
last_service_task_cnt: 4
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiaran Zhang [Thu, 20 May 2021 02:21:42 +0000 (10:21 +0800)]
net: hns3: refactor dump mac tnl status of debugfs
Currently, the debugfs command for dump mac tnl status is
implemented by "echo xxxx > cmd", and record the information
in dmesg. It's unnecessary and heavy. To improve it, create
a single file "mac_tnl_status" for it, and query it by command
"cat mac_tnl_status", return the result to userspace, rather
than record in dmesg.
The display style is below:
$ cat mac_tnl_status
Recently generated mac tnl interruption:
[0111204.175437] status = 0x30
[0154120.329912] status = 0x30
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Thu, 20 May 2021 02:21:41 +0000 (10:21 +0800)]
net: hns3: refactor dump qs shaper of debugfs
Currently, user gets qset shaper parameters by implementing debugfs
command "echo dump qs shaper > cmd", this command will dump info in
dmesg. It's unnecessary and heavy.
As there is "tm_qset" file in tm directory for dump qset info, to
optimize these command, merge qset shaper parameters to tm_qset
file and use cat command to get them.
The display style is below:
$ cat tm_qset
ID MAP_PRI LINK_VLD MODE DWRR IR_B IR_U IR_S BS_B BS_S FLAG
0000 0 1 dwrr 100 150 7 0 5 20 0
0001 0 0 sp 0 150 7 0 5 20 0
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Thu, 20 May 2021 02:21:40 +0000 (10:21 +0800)]
net: hns3: refactor dump qos buf cfg of debugfs
Currently, user gets qos buffer config by implementing debugfs command
"echo dump qos buf cfg > cmd", this command will dump info in dmesg.
It's unnecessary and heavy.
To optimize it, create a single file "qos_buf_cfg" in tm directory
and use cat command to get info. It will return info to userspace,
rather than record in dmesg.
The display style is below:
$ cat qos_buf_cfg
tx_packet_buf_tc_0: 0x120
tx_packet_buf_tc_1: 0x120
tx_packet_buf_tc_2: 0x120
tx_packet_buf_tc_3: 0x120
tx_packet_buf_tc_4: 0x0
tx_packet_buf_tc_5: 0x0
tx_packet_buf_tc_6: 0x0
tx_packet_buf_tc_7: 0x0
......
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Thu, 20 May 2021 02:21:39 +0000 (10:21 +0800)]
net: hns3: refactor dump qos pri map of debugfs
Currently, user gets priority map by implementing debugfs command
"echo dump qos pri map > cmd", this command will dump info in dmesg.
It's unnecessary and heavy.
To optimize it, create a single file "qos_pri_map" in tm directory
and use cat command to get info. It will return info to userspace,
rather than record in dmesg.
The display style is below:
$ cat qos_pri_map
vlan_to_pri: 0
PRI TC
0 0
1 1
2 2
3 3
4 0
5 1
6 2
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Thu, 20 May 2021 02:21:38 +0000 (10:21 +0800)]
net: hns3: refactor dump qos pause cfg of debugfs
Currently, user gets pause config by implementing debugfs command
"echo dump qos pause cfg > cmd", this command will dump info in dmesg.
It's unnecessary and heavy.
To optimize it, create a single file "qos_pause_cfg" in tm directory
and use cat command to get info. It will return info to userspace,
rather than record in dmesg.
The display style is below:
$ cat qos_pause_cfg
pause_trans_gap: 0x7f
pause_trans_time: 0xffff
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Thu, 20 May 2021 02:21:37 +0000 (10:21 +0800)]
net: hns3: refactor dump tc of debugfs
Currently, user gets tc schedule info by implementing debugfs command
"echo dump tc > cmd", this command will dump info in dmesg. It's
unnecessary and heavy.
To optimize it, create a single file "tc_sch_info" and use cat command
to get info. It will return info to userspace, rather than record in
dmesg.
The display style is below:
$ cat tc_sch_info
enabled tc number: 4
weight_offset: 14
TC MODE WEIGHT
0 dwrr 25
1 dwrr 25
2 dwrr 25
3 dwrr 25
4 dwrr 0
5 dwrr 0
6 dwrr 0
7 dwrr 0
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Thu, 20 May 2021 02:21:36 +0000 (10:21 +0800)]
net: hns3: refactor dump tm of debugfs
Currently, user gets some tm info by implementing debugfs command
"echo dump tm > cmd", this command will dump info in dmesg. It's
unnecessary and heavy.
In addition, the info of this command mixes info of qset, priority,
pg and port. Qset and priority have their own command to get info of
themself, so can remove info of qset and priority from this command.
To optimize it, create two new files "tm_pg", "tm_port" in tm directory
and use cat command to separately get info of pg and port.
The display style is below:
$ cat tm_pg
ID PRI_MAP MODE DWRR C_IR_B C_IR_U C_IR_S C_BS_B C_BS_S ...
00 0x1f dwrr 1 75 9 0 31 20 ...
$ cat tm_port
IR_B IR_U IR_S BS_B BS_S FLAG RATE(Mbps)
75 9 0 31 20 1 200000
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Thu, 20 May 2021 02:21:35 +0000 (10:21 +0800)]
net: hns3: refactor dump tm map of debugfs
Currently, the debugfs command for tm map is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"tm_map" for it, and query it by command "cat tm_map",
return the result to userspace, rather than record in dmesg.
As user can't specify queue id in cat command, driver will return info
of all queue id.
The display style is below:
$ cat tm_map
queue_id qset_id pri_id tc_id
0000 0000 00 00
INDEX | TM BP QSET MAPPING:
0000 |
00000000:
00000000:
00000000:
00000000:
00000000:
00000000:
00000000
0256 |
00000000:
00000000:
00000000:
00000000:
00000000:
00000002:
00000000
0512 |
00000000:
00000000:
00000000:
00000004:
00000000:
00000000:
00000000
0768 |
00000000:
00000008:
00000000:
00000000:
00000000:
00000000:
00000000
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hao Chen [Thu, 20 May 2021 02:21:34 +0000 (10:21 +0800)]
net: hns3: refactor dump fd tcam of debugfs
Currently, the debugfs command for fd tcam is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"fd_tcam" for it, and query it by command "cat fd_tcam",
return the result to userspace, rather than record in dmesg.
The display style is below:
$ cat fd_tcam
read result tcam key x(31):
00000000
00000000
00000000
08000000
00000600
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
read result tcam key y(31):
00000000
00000000
00000000
f7ff0000
0000f900
00000000
00000000
00000000
00000000
00000000
00000000
00000000
0000fff8
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>