platform/kernel/linux-rpi.git
3 years agolibbpf: Simplify the return expression of bpf_object__init_maps function
Wang Hai [Wed, 9 Jun 2021 11:56:51 +0000 (19:56 +0800)]
libbpf: Simplify the return expression of bpf_object__init_maps function

There is no need for special treatment of the 'ret == 0' case.
This patch simplifies the return expression.

Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210609115651.3392580-1-wanghai38@huawei.com
3 years agoselftests, bpf: Make docs tests fail more reliably
Joe Stringer [Tue, 8 Jun 2021 01:57:56 +0000 (18:57 -0700)]
selftests, bpf: Make docs tests fail more reliably

Previously, if rst2man caught errors, then these would be ignored and
the output file would be written anyway. This would allow developers to
introduce regressions in the docs comments in the BPF headers.

Additionally, even if you instruct rst2man to fail out, it will still
write out to the destination target file, so if you ran the tests twice
in a row it would always pass. Use a temporary file for the initial run
to ensure that if rst2man fails out under "--strict" mode, subsequent
runs will not automatically pass.

Tested via ./tools/testing/selftests/bpf/test_doc_build.sh

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20210608015756.340385-1-joe@cilium.io
3 years agolibbpf: Fix pr_warn type warnings on 32bit
Michal Suchanek [Fri, 4 Jun 2021 11:24:48 +0000 (13:24 +0200)]
libbpf: Fix pr_warn type warnings on 32bit

The printed value is ptrdiff_t and is formatted wiht %ld. This works on
64bit but produces a warning on 32bit. Fix the format specifier to %td.

Fixes: 67234743736a ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210604112448.32297-1-msuchanek@suse.de
3 years agotools/bpftool: Fix cross-build
Jean-Philippe Brucker [Thu, 3 Jun 2021 17:05:16 +0000 (19:05 +0200)]
tools/bpftool: Fix cross-build

When the bootstrap and final bpftool have different architectures, we
need to build two distinct disasm.o objects. Add a recipe for the
bootstrap disasm.o.

After commit d510296d331a ("bpftool: Use syscall/loader program in
"prog load" and "gen skeleton" command.") cross-building bpftool didn't
work anymore, because the bootstrap bpftool was linked using objects
from different architectures:

  $ make O=/tmp/bpftool ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -C tools/bpf/bpftool/ V=1
  [...]
  aarch64-linux-gnu-gcc ... -c -MMD -o /tmp/bpftool/disasm.o /home/z/src/linux/kernel/bpf/disasm.c
  gcc ... -c -MMD -o /tmp/bpftool//bootstrap/main.o main.c
  gcc ... -o /tmp/bpftool//bootstrap/bpftool /tmp/bpftool//bootstrap/main.o ... /tmp/bpftool/disasm.o
  /usr/bin/ld: /tmp/bpftool/disasm.o: Relocations in generic ELF (EM: 183)
  /usr/bin/ld: /tmp/bpftool/disasm.o: Relocations in generic ELF (EM: 183)
  /usr/bin/ld: /tmp/bpftool/disasm.o: Relocations in generic ELF (EM: 183)
  /usr/bin/ld: /tmp/bpftool/disasm.o: error adding symbols: file in wrong format
  collect2: error: ld returned 1 exit status
  [...]

The final bpftool was built for e.g. arm64, while the bootstrap bpftool,
executed on the host, was built for x86. The problem here was that disasm.o
linked into the bootstrap bpftool was arm64 rather than x86. With the fix
we build two disasm.o, one for the target bpftool in arm64, and one for
the bootstrap bpftool in x86.

Fixes: d510296d331a ("bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210603170515.1854642-1-jean-philippe@linaro.org
3 years agoselftests/bpf: Add xdp_redirect_multi into .gitignore
Andrii Nakryiko [Thu, 3 Jun 2021 00:40:26 +0000 (17:40 -0700)]
selftests/bpf: Add xdp_redirect_multi into .gitignore

When xdp_redirect_multi test binary was added recently, it wasn't added to
.gitignore. Fix that.

Fixes: d23292476297 ("selftests/bpf: Add xdp_redirect_multi test")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-5-andrii@kernel.org
3 years agolibbpf: Install skel_internal.h header used from light skeletons
Andrii Nakryiko [Thu, 3 Jun 2021 00:40:25 +0000 (17:40 -0700)]
libbpf: Install skel_internal.h header used from light skeletons

Light skeleton code assumes skel_internal.h header to be installed system-wide
by libbpf package. Make sure it is actually installed.

Fixes: 67234743736a ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-4-andrii@kernel.org
3 years agolibbpf: Refactor header installation portions of Makefile
Andrii Nakryiko [Thu, 3 Jun 2021 00:40:24 +0000 (17:40 -0700)]
libbpf: Refactor header installation portions of Makefile

As we gradually get more headers that have to be installed, it's quite
annoying to copy/paste long $(call) commands. So extract that logic and do
a simple $(foreach) over the list of headers.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-3-andrii@kernel.org
3 years agolibbpf: Move few APIs from 0.4 to 0.5 version
Andrii Nakryiko [Thu, 3 Jun 2021 00:40:23 +0000 (17:40 -0700)]
libbpf: Move few APIs from 0.4 to 0.5 version

Official libbpf 0.4 release doesn't include three APIs that were tentatively
put into 0.4 section. Fix libbpf.map and move these three APIs:

  - bpf_map__initial_value;
  - bpf_map_lookup_and_delete_elem_flags;
  - bpf_object__gen_loader.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-2-andrii@kernel.org
3 years agobpf, tnums: Provably sound, faster, and more precise algorithm for tnum_mul
Harishankar Vishwanathan [Mon, 31 May 2021 02:01:57 +0000 (22:01 -0400)]
bpf, tnums: Provably sound, faster, and more precise algorithm for tnum_mul

This patch introduces a new algorithm for multiplication of tristate
numbers (tnums) that is provably sound. It is faster and more precise when
compared to the existing method.

Like the existing method, this new algorithm follows the long
multiplication algorithm. The idea is to generate partial products by
multiplying each bit in the multiplier (tnum a) with the multiplicand
(tnum b), and adding the partial products after appropriately bit-shifting
them. The new algorithm, however, uses just a single loop over the bits of
the multiplier (tnum a) and accumulates only the uncertain components of
the multiplicand (tnum b) into a mask-only tnum. The following paper
explains the algorithm in more detail: https://arxiv.org/abs/2105.05398.

A natural way to construct the tnum product is by performing a tnum
addition on all the partial products. This algorithm presents another
method of doing this: decompose each partial product into two tnums,
consisting of the values and the masks separately. The mask-sum is
accumulated within the loop in acc_m. The value-sum tnum is generated
using a.value * b.value. The tnum constructed by tnum addition of the
value-sum and the mask-sum contains all possible summations of concrete
values drawn from the partial product tnums pairwise. We prove this result
in the paper.

Our evaluations show that the new algorithm is overall more precise
(producing tnums with less uncertain components) than the existing method.
As an illustrative example, consider the input tnums A and B. The numbers
in the parenthesis correspond to (value;mask).

  A                = 000000x1 (1;2)
  B                = 0010011x (38;1)
  A * B (existing) = xxxxxxxx (0;255)
  A * B (new)      = 0x1xxxxx (32;95)

Importantly, we present a proof of soundness of the new algorithm in the
aforementioned paper. Additionally, we show that this new algorithm is
empirically faster than the existing method.

Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu>
Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu>
Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@rutgers.edu>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://arxiv.org/abs/2105.05398
Link: https://lore.kernel.org/bpf/20210531020157.7386-1-harishankar.vishwanathan@rutgers.edu
3 years agobpf, devmap: Remove drops variable from bq_xmit_all()
Hangbin Liu [Fri, 28 May 2021 02:43:56 +0000 (22:43 -0400)]
bpf, devmap: Remove drops variable from bq_xmit_all()

As Colin pointed out, the first drops assignment after declaration will
be overwritten by the second drops assignment before using, which makes
it useless.

Since the drops variable will be used only once. Just remove it and
use "cnt - sent" in trace_xdp_devmap_xmit().

Fixes: cb261b594b41 ("bpf: Run devmap xdp_prog on flush instead of bulk enqueue")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210528024356.24333-1-liuhangbin@gmail.com
3 years agobpf, docs: Add llvm_reloc.rst to explain llvm bpf relocations
Yonghong Song [Wed, 26 May 2021 15:24:57 +0000 (08:24 -0700)]
bpf, docs: Add llvm_reloc.rst to explain llvm bpf relocations

LLVM upstream commit https://reviews.llvm.org/D102712 made some changes
to bpf relocations to make them llvm linker lld friendly. The scope of
existing relocations R_BPF_64_{64,32} is narrowed and new relocations
R_BPF_64_{ABS32,ABS64,NODYLD32} are introduced.

Let us add some documentation about llvm bpf relocations so people can
understand how to resolve them properly in their respective tools.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210526152457.335210-1-yhs@fb.com
3 years agolibbpf: Move BPF_SEQ_PRINTF and BPF_SNPRINTF to bpf_helpers.h
Florent Revest [Wed, 26 May 2021 16:46:43 +0000 (18:46 +0200)]
libbpf: Move BPF_SEQ_PRINTF and BPF_SNPRINTF to bpf_helpers.h

These macros are convenient wrappers around the bpf_seq_printf and
bpf_snprintf helpers. They are currently provided by bpf_tracing.h which
targets low level tracing primitives. bpf_helpers.h is a better fit.

The __bpf_narg and __bpf_apply are needed in both files and provided
twice. __bpf_empty isn't used anywhere and is removed from bpf_tracing.h

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210526164643.2881368-1-revest@chromium.org
3 years agoMerge branch 'bpf-xdp-bcast'
Daniel Borkmann [Wed, 26 May 2021 07:46:17 +0000 (09:46 +0200)]
Merge branch 'bpf-xdp-bcast'

Hangbin Liu says:

====================
This patchset is a new implementation for XDP multicast support based
on my previous 2 maps implementation[1]. The reason is that Daniel thinks
the exclude map implementation is missing proper bond support in XDP
context. And there is a plan to add native XDP bonding support. Adding
a exclude map in the helper also increases the complexity of verifier and
has drawbacks on performance.

The new implementation just add two new flags BPF_F_BROADCAST and
BPF_F_EXCLUDE_INGRESS to extend xdp_redirect_map for broadcast support.

With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.

The patchv11 link is here [2].

  [1] https://lore.kernel.org/bpf/20210223125809.1376577-1-liuhangbin@gmail.com
  [2] https://lore.kernel.org/bpf/20210513070447.1878448-1-liuhangbin@gmail.com

v12: As Daniel pointed out:
  a) defined as const u64 for flag_mask and action_mask in
     __bpf_xdp_redirect_map()
  b) remove BPF_F_ACTION_MASK in uapi header
  c) remove EXPORT_SYMBOL_GPL for xdpf_clone()

v11:
  a) Use unlikely() when checking if this is for broadcast redirecting.
  b) Fix a tracepoint NULL pointer issue Jesper found
  c) Remove BPF_F_REDIR_MASK and just use OR flags to make the reader more
     clear about what's flags we are using
  d) Add the performace number with multi veth interfaces in patch 01
     description.
  e) remove some sleeps to reduce the testing time in patch04. Re-struct the
     test and make clear what flags we are testing.

v10: use READ/WRITE_ONCE when read/write map instead of xchg()
v9: Update patch 01 commit description
v8: use hlist_for_each_entry_rcu() when looping the devmap hash ojbs
v7: No need to free xdpf in dev_map_enqueue_clone() if xdpf_clone failed.
v6: Fix a skb leak in the error path for generic XDP
v5: Just walk the map directly to get interfaces as get_next_key() of devmap
    hash may restart looping from the first key if the device get removed.
    After update the performace has improved 10% compired with v4.
v4: Fix flags never cleared issue in patch 02. Update selftest to cover this.
v3: Rebase the code based on latest bpf-next
v2: fix flag renaming issue in patch 02
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
3 years agoselftests/bpf: Add xdp_redirect_multi test
Hangbin Liu [Wed, 19 May 2021 09:07:47 +0000 (17:07 +0800)]
selftests/bpf: Add xdp_redirect_multi test

Add a bpf selftest for new helper xdp_redirect_map_multi(). In this
test there are 3 forward groups and 1 exclude group. The test will
redirect each interface's packets to all the interfaces in the forward
group, and exclude the interface in exclude map.

Two maps (DEVMAP, DEVMAP_HASH) and two xdp modes (generic, drive) will
be tested. XDP egress program will also be tested by setting pkt src MAC
to egress interface's MAC address.

For more test details, you can find it in the test script. Here is
the test result.
]# time ./test_xdp_redirect_multi.sh
Pass: xdpgeneric arp(F_BROADCAST) ns1-1
Pass: xdpgeneric arp(F_BROADCAST) ns1-2
Pass: xdpgeneric arp(F_BROADCAST) ns1-3
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-1
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-2
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-3
Pass: xdpgeneric IPv6 (no flags) ns1-1
Pass: xdpgeneric IPv6 (no flags) ns1-2
Pass: xdpdrv arp(F_BROADCAST) ns1-1
Pass: xdpdrv arp(F_BROADCAST) ns1-2
Pass: xdpdrv arp(F_BROADCAST) ns1-3
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-1
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-2
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-3
Pass: xdpdrv IPv6 (no flags) ns1-1
Pass: xdpdrv IPv6 (no flags) ns1-2
Pass: xdpegress mac ns1-2
Pass: xdpegress mac ns1-3
Summary: PASS 18, FAIL 0

real    1m18.321s
user    0m0.123s
sys     0m0.350s

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-5-liuhangbin@gmail.com
3 years agosample/bpf: Add xdp_redirect_map_multi for redirect_map broadcast test
Hangbin Liu [Wed, 19 May 2021 09:07:46 +0000 (17:07 +0800)]
sample/bpf: Add xdp_redirect_map_multi for redirect_map broadcast test

This is a sample for xdp redirect broadcast. In the sample we could forward
all packets between given interfaces. There is also an option -X that could
enable 2nd xdp_prog on egress interface.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-4-liuhangbin@gmail.com
3 years agoxdp: Extend xdp_redirect_map with broadcast support
Hangbin Liu [Wed, 19 May 2021 09:07:45 +0000 (17:07 +0800)]
xdp: Extend xdp_redirect_map with broadcast support

This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
extend xdp_redirect_map for broadcast support.

With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.

When getting the devices in dev hash map via dev_map_hash_get_next_key(),
there is a possibility that we fall back to the first key when a device
was removed. This will duplicate packets on some interfaces. So just walk
the whole buckets to avoid this issue. For dev array map, we also walk the
whole map to find valid interfaces.

Function bpf_clear_redirect_map() was removed in
commit ee75aef23afe ("bpf, xdp: Restructure redirect actions").
Add it back as we need to use ri->map again.

With test topology:
  +-------------------+             +-------------------+
  | Host A (i40e 10G) |  ---------- | eno1(i40e 10G)    |
  +-------------------+             |                   |
                                    |   Host B          |
  +-------------------+             |                   |
  | Host C (i40e 10G) |  ---------- | eno2(i40e 10G)    |
  +-------------------+             |                   |
                                    |          +------+ |
                                    | veth0 -- | Peer | |
                                    | veth1 -- |      | |
                                    | veth2 -- |  NS  | |
                                    |          +------+ |
                                    +-------------------+

On Host A:
 # pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64

On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
All the veth peers in the NS have a XDP_DROP program loaded. The
forward_map max_entries in xdp_redirect_map_multi is modify to 4.

Testing the performance impact on the regular xdp_redirect path with and
without patch (to check impact of additional check for broadcast mode):

5.12 rc4         | redirect_map        i40e->i40e      |    2.0M |  9.7M
5.12 rc4         | redirect_map        i40e->veth      |    1.7M | 11.8M
5.12 rc4 + patch | redirect_map        i40e->i40e      |    2.0M |  9.6M
5.12 rc4 + patch | redirect_map        i40e->veth      |    1.7M | 11.7M

Testing the performance when cloning packets with the redirect_map_multi
test, using a redirect map size of 4, filled with 1-3 devices:

5.12 rc4 + patch | redirect_map multi  i40e->veth (x1) |    1.7M | 11.4M
5.12 rc4 + patch | redirect_map multi  i40e->veth (x2) |    1.1M |  4.3M
5.12 rc4 + patch | redirect_map multi  i40e->veth (x3) |    0.8M |  2.6M

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com
3 years agobpf: Run devmap xdp_prog on flush instead of bulk enqueue
Jesper Dangaard Brouer [Wed, 19 May 2021 09:07:44 +0000 (17:07 +0800)]
bpf: Run devmap xdp_prog on flush instead of bulk enqueue

This changes the devmap XDP program support to run the program when the
bulk queue is flushed instead of before the frame is enqueued. This has
a couple of benefits:

- It "sorts" the packets by destination devmap entry, and then runs the
  same BPF program on all the packets in sequence. This ensures that we
  keep the XDP program and destination device properties hot in I-cache.

- It makes the multicast implementation simpler because it can just
  enqueue packets using bq_enqueue() without having to deal with the
  devmap program at all.

The drawback is that if the devmap program drops the packet, the enqueue
step is redundant. However, arguably this is mostly visible in a
micro-benchmark, and with more mixed traffic the I-cache benefit should
win out. The performance impact of just this patch is as follows:

Using 2 10Gb i40e NIC, redirecting one to another, or into a veth interface,
which do XDP_DROP on veth peer. With xdp_redirect_map in sample/bpf, send
pkts via pktgen cmd:
./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64

There are about +/- 0.1M deviation for native testing, the performance
improved for the base-case, but some drop back with xdp devmap prog attached.

Version          | Test                           | Generic | Native | Native + 2nd xdp_prog
5.12 rc4         | xdp_redirect_map   i40e->i40e  |    1.9M |   9.6M |  8.4M
5.12 rc4         | xdp_redirect_map   i40e->veth  |    1.7M |  11.7M |  9.8M
5.12 rc4 + patch | xdp_redirect_map   i40e->i40e  |    1.9M |   9.8M |  8.0M
5.12 rc4 + patch | xdp_redirect_map   i40e->veth  |    1.7M |  12.0M |  9.4M

When bq_xmit_all() is called from bq_enqueue(), another packet will
always be enqueued immediately after, so clearing dev_rx, xdp_prog and
flush_node in bq_xmit_all() is redundant. Move the clear to __dev_flush(),
and only check them once in bq_enqueue() since they are all modified
together.

This change also has the side effect of extending the lifetime of the
RCU-protected xdp_prog that lives inside the devmap entries: Instead of
just living for the duration of the XDP program invocation, the
reference now lives all the way until the bq is flushed. This is safe
because the bq flush happens at the end of the NAPI poll loop, so
everything happens between a local_bh_disable()/local_bh_enable() pair.
However, this is by no means obvious from looking at the call sites; in
particular, some drivers have an additional rcu_read_lock() around only
the XDP program invocation, which only confuses matters further.
Cleaning this up will be done in a separate patch series.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-2-liuhangbin@gmail.com
3 years agoMerge branch 'libbpf: error reporting changes for v1.0'
Alexei Starovoitov [Wed, 26 May 2021 00:32:35 +0000 (17:32 -0700)]
Merge branch 'libbpf: error reporting changes for v1.0'

Andrii Nakryiko says:

====================

Implement error reporting changes discussed in "Libbpf: the road to v1.0"
([0]) document.

Libbpf gets a new API, libbpf_set_strict_mode() which accepts a set of flags
that turn on a set of libbpf 1.0 changes, that might be potentially breaking.
It's possible to opt-in into all current and future 1.0 features by specifying
LIBBPF_STRICT_ALL flag.

When some of the 1.0 "features" are requested, libbpf APIs might behave
differently. In this patch set a first set of changes are implemented, all
related to the way libbpf returns errors. See individual patches for details.

Patch #1 adds a no-op libbpf_set_strict_mode() functionality to enable
updating selftests.

Patch #2 gets rid of all the bad code patterns that will break in libbpf 1.0
(exact -1 comparison for low-level APIs, direct IS_ERR() macro usage to check
pointer-returning APIs for error, etc). These changes make selftest work in
both legacy and 1.0 libbpf modes. Selftests also opt-in into 100% libbpf 1.0
mode to automatically gain all the subsequent changes, which will come in
follow up patches.

Patch #3 streamlines error reporting for low-level APIs wrapping bpf() syscall.

Patch #4 streamlines errors for all the rest APIs.

Patch #5 ensures that BPF skeletons propagate errors properly as well, as
currently on error some APIs will return NULL with no way of checking exact
error code.

  [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY

v1->v2:
  - move libbpf_set_strict_mode() implementation to patch #1, where it belongs
    (Alexei);
  - add acks, slight rewording of commit messages.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpftool: Set errno on skeleton failures and propagate errors
Andrii Nakryiko [Tue, 25 May 2021 03:59:35 +0000 (20:59 -0700)]
bpftool: Set errno on skeleton failures and propagate errors

Follow libbpf's error handling conventions and pass through errors and errno
properly. Skeleton code always returned NULL on errors (not ERR_PTR(err)), so
there are no backwards compatibility concerns. But now we also set errno
properly, so it's possible to distinguish different reasons for failure, if
necessary.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-6-andrii@kernel.org
3 years agolibbpf: Streamline error reporting for high-level APIs
Andrii Nakryiko [Tue, 25 May 2021 03:59:34 +0000 (20:59 -0700)]
libbpf: Streamline error reporting for high-level APIs

Implement changes to error reporting for high-level libbpf APIs to make them
less surprising and less error-prone to users:
  - in all the cases when error happens, errno is set to an appropriate error
    value;
  - in libbpf 1.0 mode, all pointer-returning APIs return NULL on error and
    error code is communicated through errno; this applies both to APIs that
    already returned NULL before (so now they communicate more detailed error
    codes), as well as for many APIs that used ERR_PTR() macro and encoded
    error numbers as fake pointers.
  - in legacy (default) mode, those APIs that were returning ERR_PTR(err),
    continue doing so, but still set errno.

With these changes, errno can be always used to extract actual error,
regardless of legacy or libbpf 1.0 modes. This is utilized internally in
libbpf in places where libbpf uses it's own high-level APIs.
libbpf_get_error() is adapted to handle both cases completely transparently to
end-users (and is used by libbpf consistently as well).

More context, justification, and discussion can be found in "Libbpf: the road
to v1.0" document ([0]).

  [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-5-andrii@kernel.org
3 years agolibbpf: Streamline error reporting for low-level APIs
Andrii Nakryiko [Tue, 25 May 2021 03:59:33 +0000 (20:59 -0700)]
libbpf: Streamline error reporting for low-level APIs

Ensure that low-level APIs behave uniformly across the libbpf as follows:
  - in case of an error, errno is always set to the correct error code;
  - when libbpf 1.0 mode is enabled with LIBBPF_STRICT_DIRECT_ERRS option to
    libbpf_set_strict_mode(), return -Exxx error value directly, instead of -1;
  - by default, until libbpf 1.0 is released, keep returning -1 directly.

More context, justification, and discussion can be found in "Libbpf: the road
to v1.0" document ([0]).

  [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-4-andrii@kernel.org
3 years agoselftests/bpf: Turn on libbpf 1.0 mode and fix all IS_ERR checks
Andrii Nakryiko [Tue, 25 May 2021 03:59:32 +0000 (20:59 -0700)]
selftests/bpf: Turn on libbpf 1.0 mode and fix all IS_ERR checks

Turn ony libbpf 1.0 mode. Fix all the explicit IS_ERR checks that now will be
broken because libbpf returns NULL on error (and sets errno). Fix
ASSERT_OK_PTR and ASSERT_ERR_PTR to work for both old mode and new modes and
use them throughout selftests. This is trivial to do by using
libbpf_get_error() API that all libbpf users are supposed to use, instead of
IS_ERR checks.

A bunch of checks also did explicit -1 comparison for various fd-returning
APIs. Such checks are replaced with >= 0 or < 0 cases.

There were also few misuses of bpf_object__find_map_by_name() in test_maps.
Those are fixed in this patch as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-3-andrii@kernel.org
3 years agolibbpf: Add libbpf_set_strict_mode() API to turn on libbpf 1.0 behaviors
Andrii Nakryiko [Tue, 25 May 2021 03:59:31 +0000 (20:59 -0700)]
libbpf: Add libbpf_set_strict_mode() API to turn on libbpf 1.0 behaviors

Add libbpf_set_strict_mode() API that allows application to simulate libbpf
1.0 breaking changes before libbpf 1.0 is released. This will help users
migrate gradually and with confidence.

For now only ALL or NONE options are available, subsequent patches will add
more flags. This patch is preliminary for selftests/bpf changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-2-andrii@kernel.org
3 years agoxsk: Use kvcalloc to support large umems
Magnus Karlsson [Fri, 21 May 2021 08:33:01 +0000 (10:33 +0200)]
xsk: Use kvcalloc to support large umems

Use kvcalloc() instead of kcalloc() to support large umems with, on my
server, one million pages or more in the umem.

Reported-by: Dan Siemon <dan@coverfire.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210521083301.26921-1-magnus.karlsson@gmail.com
3 years agobpf: Fix spelling mistakes
Zhen Lei [Tue, 25 May 2021 02:56:59 +0000 (10:56 +0800)]
bpf: Fix spelling mistakes

Fix some spelling mistakes in comments:
aother ==> another
Netiher ==> Neither
desribe ==> describe
intializing ==> initializing
funciton ==> function
wont ==> won't and move the word 'the' at the end to the next line
accross ==> across
pathes ==> paths
triggerred ==> triggered
excute ==> execute
ether ==> either
conervative ==> conservative
convetion ==> convention
markes ==> marks
interpeter ==> interpreter

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210525025659.8898-2-thunder.leizhen@huawei.com
3 years agosamples: bpf: Ix kernel-doc syntax in file header
Aditya Srivastava [Sun, 23 May 2021 15:14:08 +0000 (20:44 +0530)]
samples: bpf: Ix kernel-doc syntax in file header

The opening comment mark '/**' is used for highlighting the beginning of
kernel-doc comments.
The header for samples/bpf/ibumad_kern.c follows this syntax, but
the content inside does not comply with kernel-doc.

This line was probably not meant for kernel-doc parsing, but is parsed
due to the presence of kernel-doc like comment syntax(i.e, '/**'), which
causes unexpected warnings from kernel-doc:
warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * ibumad BPF sample kernel side

Provide a simple fix by replacing this occurrence with general comment
format, i.e. '/*', to prevent kernel-doc from parsing it.

Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/bpf/20210523151408.22280-1-yashsri421@gmail.com
3 years agolibbpf: Add support for new llvm bpf relocations
Yonghong Song [Sat, 22 May 2021 16:23:41 +0000 (09:23 -0700)]
libbpf: Add support for new llvm bpf relocations

LLVM patch https://reviews.llvm.org/D102712
narrowed the scope of existing R_BPF_64_64
and R_BPF_64_32 relocations, and added three
new relocations, R_BPF_64_ABS64, R_BPF_64_ABS32
and R_BPF_64_NODYLD32. The main motivation is
to make relocations linker friendly.

This change, unfortunately, breaks libbpf build,
and we will see errors like below:
  libbpf: ELF relo #0 in section #6 has unexpected type 2 in
     /home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o
  Error: failed to link
     '/home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o':
     Unknown error -22 (-22)
The new relocation R_BPF_64_ABS64 is generated
and libbpf linker sanity check doesn't understand it.
Relocation section '.rel.struct_ops' at offset 0x1410 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name
0000000000000018  0000000700000002 R_BPF_64_ABS64         0000000000000000 nogpltcp_init

Look at the selftests/bpf/bpf_tcp_nogpl.c,
  void BPF_STRUCT_OPS(nogpltcp_init, struct sock *sk)
  {
  }

  SEC(".struct_ops")
  struct tcp_congestion_ops bpf_nogpltcp = {
          .init           = (void *)nogpltcp_init,
          .name           = "bpf_nogpltcp",
  };
The new llvm relocation scheme categorizes 'nogpltcp_init' reference
as R_BPF_64_ABS64 instead of R_BPF_64_64 which is used to specify
ld_imm64 relocation in the new scheme.

Let us fix the linker sanity checking by including
R_BPF_64_ABS64 and R_BPF_64_ABS32. There is no need to
check R_BPF_64_NODYLD32 which is used for .BTF and .BTF.ext.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210522162341.3687617-1-yhs@fb.com
3 years agoMerge branch 'Add lookup_and_delete_elem support to BPF hash map types'
Andrii Nakryiko [Mon, 24 May 2021 20:30:53 +0000 (13:30 -0700)]
Merge branch 'Add lookup_and_delete_elem support to BPF hash map types'

Denis Salopek says:

====================

This patch series extends the existing bpf_map_lookup_and_delete_elem()
functionality with 4 more map types:
 - BPF_MAP_TYPE_HASH,
 - BPF_MAP_TYPE_PERCPU_HASH,
 - BPF_MAP_TYPE_LRU_HASH and
 - BPF_MAP_TYPE_LRU_PERCPU_HASH.

Patch 1 adds most of its functionality and logic as well as
documentation.

As it was previously limited to only stacks and queues which do not
support the BPF_F_LOCK flag, patch 2 enables its usage by adding a new
libbpf API bpf_map_lookup_and_delete_elem_flags() based on the existing
bpf_map_lookup_elem_flags().

Patch 3 adds selftests for lookup_and_delete_elem().

Changes in patch 1:
v7: Minor formating nits, add Acked-by.
v6: Remove unneeded flag check, minor code/format fixes.
v5: Split patch to 3 patches. Extend BPF_MAP_LOOKUP_AND_DELETE_ELEM
documentation with this changes.
v4: Fix the return value for unsupported map types.
v3: Add bpf_map_lookup_and_delete_elem_flags() and enable BPF_F_LOCK
flag, change CHECKs to ASSERT_OKs, initialize variables to 0.
v2: Add functionality for LRU/per-CPU, add test_progs tests.

Changes in patch 2:
v7: No change.
v6: Add Acked-by.
v5: Move to the newest libbpf version (0.4.0).

Changes in patch 3:
v7: Remove ASSERT_GE macro which is already added in some other commit,
change ASSERT_OK to ASSERT_OK_PTR, add Acked-by.
v6: Remove PERCPU macros, add ASSERT_GE macro to test_progs.h, remove
leftover code.
v5: Use more appropriate macros. Better check for changed value.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
3 years agoselftests/bpf: Add bpf_lookup_and_delete_elem tests
Denis Salopek [Tue, 11 May 2021 21:00:06 +0000 (23:00 +0200)]
selftests/bpf: Add bpf_lookup_and_delete_elem tests

Add bpf selftests and extend existing ones for a new function
bpf_lookup_and_delete_elem() for (percpu) hash and (percpu) LRU hash map
types.
In test_lru_map and test_maps we add an element, lookup_and_delete it,
then check whether it's deleted.
The newly added lookup_and_delete prog tests practically do the same
thing but additionally use a BPF program to change the value of the
element for LRU maps.

Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/d30d3e0060c1f750e133579623cf1c60ff58f3d9.1620763117.git.denis.salopek@sartura.hr
3 years agobpf: Extend libbpf with bpf_map_lookup_and_delete_elem_flags
Denis Salopek [Tue, 11 May 2021 21:00:05 +0000 (23:00 +0200)]
bpf: Extend libbpf with bpf_map_lookup_and_delete_elem_flags

Add bpf_map_lookup_and_delete_elem_flags() libbpf API in order to use
the BPF_F_LOCK flag with the map_lookup_and_delete_elem() function.

Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/15b05dafe46c7e0750d110f233977372029d1f62.1620763117.git.denis.salopek@sartura.hr
3 years agobpf: Add lookup_and_delete_elem support to hashtab
Denis Salopek [Tue, 11 May 2021 21:00:04 +0000 (23:00 +0200)]
bpf: Add lookup_and_delete_elem support to hashtab

Extend the existing bpf_map_lookup_and_delete_elem() functionality to
hashtab map types, in addition to stacks and queues.
Create a new hashtab bpf_map_ops function that does lookup and deletion
of the element under the same bucket lock and add the created map_ops to
bpf.h.

Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/4d18480a3e990ffbf14751ddef0325eed3be2966.1620763117.git.denis.salopek@sartura.hr
3 years agolibbpf: Skip bpf_object__probe_loading for light skeleton
Stanislav Fomichev [Fri, 21 May 2021 03:06:53 +0000 (20:06 -0700)]
libbpf: Skip bpf_object__probe_loading for light skeleton

I'm getting the following error when running 'gen skeleton -L' as
regular user:

libbpf: Error in bpf_object__probe_loading():Operation not permitted(1).
Couldn't load trivial BPF program. Make sure your kernel supports BPF
(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
value.

Fixes: 67234743736a ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210521030653.2626513-1-sdf@google.com
3 years agoethernet: ucc_geth: Use kmemdup() rather than kmalloc+memcpy
YueHaibing [Mon, 24 May 2021 01:07:01 +0000 (09:07 +0800)]
ethernet: ucc_geth: Use kmemdup() rather than kmalloc+memcpy

Issue identified with Coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: r6040: Allow restarting auto-negotiation
Florian Fainelli [Sun, 23 May 2021 15:58:42 +0000 (08:58 -0700)]
net: r6040: Allow restarting auto-negotiation

Use phy_ethtool_nway_reset() since the driver makes use of the PHY
library.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'r6040-cleanups'
David S. Miller [Mon, 24 May 2021 00:20:53 +0000 (17:20 -0700)]
Merge branch 'r6040-cleanups'

Florian Fainelli says:

====================
net: r6040: Non-functional changes

These two patches clean up the r6040 driver a little bit in preparation
for adding additional features such as dumping MAC counters and properly
dealing with DMA-API mapping.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: r6040: Use ETH_FCS_LEN
Florian Fainelli [Sun, 23 May 2021 15:54:11 +0000 (08:54 -0700)]
net: r6040: Use ETH_FCS_LEN

Instead of the open coded constant 4.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: r6040: Use logical or for MDIO operations
Florian Fainelli [Sun, 23 May 2021 15:54:10 +0000 (08:54 -0700)]
net: r6040: Use logical or for MDIO operations

This is not a functional change, but we should be using a logical or to
assign the bits we will be writing to the MDIO read and write registers.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoehea: Use DEVICE_ATTR_*() macro
YueHaibing [Sun, 23 May 2021 06:02:23 +0000 (14:02 +0800)]
ehea: Use DEVICE_ATTR_*() macro

Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agosfc: falcon: use DEVICE_ATTR_*() macro
YueHaibing [Sun, 23 May 2021 03:24:09 +0000 (11:24 +0800)]
sfc: falcon: use DEVICE_ATTR_*() macro

Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agosfc: use DEVICE_ATTR_*() macro
YueHaibing [Sun, 23 May 2021 03:20:30 +0000 (11:20 +0800)]
sfc: use DEVICE_ATTR_*() macro

Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ftgmac100: add missing error return code in ftgmac100_probe()
Yang Yingliang [Sat, 22 May 2021 12:02:46 +0000 (20:02 +0800)]
net: ftgmac100: add missing error return code in ftgmac100_probe()

The variables will be free on path err_phy_connect, it should
return error code, or it will cause double free when calling
ftgmac100_remove().

Fixes: bd466c3fb5a4 ("net/faraday: Support NCSI mode")
Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodpaa2-eth: don't print error from dpaa2_mac_connect if that's EPROBE_DEFER
Vladimir Oltean [Fri, 21 May 2021 14:12:20 +0000 (17:12 +0300)]
dpaa2-eth: don't print error from dpaa2_mac_connect if that's EPROBE_DEFER

When booting a board with DPAA2 interfaces defined statically via DPL
(as opposed to creating them dynamically using restool), the driver will
print an unspecific error message.

This change adds the error code to the message, and avoids printing
altogether if the error code is EPROBE_DEFER, because that is not a
cause of alarm.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'dpaa2-eth-of_node'
David S. Miller [Fri, 21 May 2021 21:05:04 +0000 (14:05 -0700)]
Merge branch 'dpaa2-eth-of_node'

Ioana Ciornei says:

====================
dpaa2-eth: setup the of_node

This patch set allows DSA to work with a DPAA2 master device by setting
up the of_node to point to the corresponding MAC DTS node, so that
of_find_net_device_by_node() can find it.
As an added benefit, udev rules can now be used to create a naming
scheme based on the physical MAC.

The second patch renames the debugfs directory to use the DPNI name
instead of the netdev name, since the latter can be changed after probe
time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodpaa2-eth: name the debugfs directory after the DPNI object
Ioana Ciornei [Fri, 21 May 2021 13:25:30 +0000 (16:25 +0300)]
dpaa2-eth: name the debugfs directory after the DPNI object

Name the debugfs directory after the DPNI object instead of the netdev
name since this can be changed after probe by udev rules.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodpaa2-eth: setup the of_node field of the device
Ioana Ciornei [Fri, 21 May 2021 13:25:29 +0000 (16:25 +0300)]
dpaa2-eth: setup the of_node field of the device

When the DPNI object is connected to a DPMAC, setup the of_node to point
to the DTS device node of that specific MAC. This enables other drivers,
for example the DSA subsystem, to find the net_device by its device
node.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'sja1105-stats'
David S. Miller [Fri, 21 May 2021 21:01:41 +0000 (14:01 -0700)]
Merge branch 'sja1105-stats'

Vladimir Oltean says:

====================
Ethtool statistics counters cleanup for SJA1105 DSA driver

This series removes some reported data from ethtool -S which were not
counters at all, and reorganizes the code such that counters can be read
individually and not just all at once.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: sja1105: don't use burst SPI reads for port statistics
Vladimir Oltean [Fri, 21 May 2021 13:16:08 +0000 (16:16 +0300)]
net: dsa: sja1105: don't use burst SPI reads for port statistics

The current internal sja1105 driver API is optimized for retrieving many
statistics counters at once. But the switch does not do atomic snapshotting
for them anyway.

In case we start reporting the hardware port counters through
ndo_get_stats64 as well, not just ethtool, it would be good to be able
to read individual port counters and not all of them.

Additionally, since Arnd Bergmann's commit ae1804de93f6 ("dsa: sja1105:
dynamically allocate stats structure"), sja1105_get_ethtool_stats
allocates memory dynamically, since struct sja1105_port_status was
deemed to consume too much stack memory. That is not ideal.
The large structure is only needed because of the burst read.
If we read statistics one by one, we can consume less memory, and
we can avoid dynamic allocation.

Additionally, latency-sensitive interfaces such as PTP operations (for
phc2sys) might suffer if the SPI mutex is being held for too long, which
happens in the case of SPI burst reads. By reading counters one by one,
we give a chance for higher priority processes to preempt and take the
SPI bus mutex for accessing the PTP clock.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: sja1105: stop reporting the queue levels in ethtool port counters
Vladimir Oltean [Fri, 21 May 2021 13:16:07 +0000 (16:16 +0300)]
net: dsa: sja1105: stop reporting the queue levels in ethtool port counters

The queue levels are not counters, but instead they represent the
occupancy of the MAC TX queues. Having these in ethtool port counters is
not helpful, so remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: Fix return of uninitialized variable ret
Colin Ian King [Fri, 21 May 2021 10:01:46 +0000 (11:01 +0100)]
net: hns3: Fix return of uninitialized variable ret

In the unlikely event that rule_cnt is zero the variable ret is
not assigned a value and function hclge_dbg_dump_fd_tcam can end
up returning an unitialized value in ret. Fix this by explicitly
setting ret to zero before the for-loop.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: b5a0b70d77b9 ("net: hns3: refactor dump fd tcam of debugfs")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoatm: Fix typo
zuoqilin [Fri, 21 May 2021 09:45:22 +0000 (17:45 +0800)]
atm: Fix typo

Change 'contol' to 'control'.

Signed-off-by: zuoqilin <zuoqilin@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: Fix inconsistent indenting
Jiapeng Chong [Fri, 21 May 2021 09:40:14 +0000 (17:40 +0800)]
net: phy: Fix inconsistent indenting

Eliminate the follow smatch warning:

drivers/net/phy/phy_device.c:2886 phy_probe() warn: inconsistent
indenting.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agosfc: farch: fix compile warning in efx_farch_dimension_resources()
Yang Yingliang [Fri, 21 May 2021 03:57:21 +0000 (11:57 +0800)]
sfc: farch: fix compile warning in efx_farch_dimension_resources()

Fix the following kernel build warning when CONFIG_SFC_SRIOV is disabled:

  drivers/net/ethernet/sfc/farch.c: In function ‘efx_farch_dimension_resources’:
  drivers/net/ethernet/sfc/farch.c:1671:21: warning: variable ‘buftbl_min’ set but not used [-Wunused-but-set-variable]
    unsigned vi_count, buftbl_min, total_tx_channels;

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bonding: bond_alb: Fix some typos in bond_alb.c
Wang Hai [Fri, 21 May 2021 03:31:35 +0000 (11:31 +0800)]
net: bonding: bond_alb: Fix some typos in bond_alb.c

s/becase/because/
s/reqeusts/requests/
s/funcions/functions/
s/addreses/addresses/

Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocaif_virtio: Fix some typos in caif_virtio.c
Wang Hai [Fri, 21 May 2021 03:24:55 +0000 (11:24 +0800)]
caif_virtio: Fix some typos in caif_virtio.c

s/patckets/packets/
s/avilable/available/
s/tbe/the/

Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'wan-cleanups'
David S. Miller [Fri, 21 May 2021 20:26:41 +0000 (13:26 -0700)]
Merge branch 'wan-cleanups'

Guangbin Huang says:

====================
net: wan: clean up some code style issues

This patchset clean up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: add necessary () to macro argument
Peng Li [Fri, 21 May 2021 01:08:17 +0000 (09:08 +0800)]
net: wan: add necessary () to macro argument

Macro argument 'card' and 'port' may be better as
'(card)' and '(port)' to avoid precedence issues.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: add braces {} to all arms of the statement
Peng Li [Fri, 21 May 2021 01:08:16 +0000 (09:08 +0800)]
net: wan: add braces {} to all arms of the statement

Braces {} should be used on all arms of this statement.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: remove redundant blank lines
Peng Li [Fri, 21 May 2021 01:08:15 +0000 (09:08 +0800)]
net: wan: remove redundant blank lines

This patch removes some redundant blank lines.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: fix the code style issue about trailing statements
Peng Li [Fri, 21 May 2021 01:08:14 +0000 (09:08 +0800)]
net: wan: fix the code style issue about trailing statements

Trailing statements should be on next line.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: add some required spaces
Peng Li [Fri, 21 May 2021 01:08:13 +0000 (09:08 +0800)]
net: wan: add some required spaces

Add space required after that close brace '}'.
Add space required before the open parenthesis '('.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: fix an code style issue about "foo* bar"
Peng Li [Fri, 21 May 2021 01:08:12 +0000 (09:08 +0800)]
net: wan: fix an code style issue about "foo* bar"

Fix the checkpatch error as "foo* bar" should be "foo *bar".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'sja1105-spi'
David S. Miller [Fri, 21 May 2021 20:23:29 +0000 (13:23 -0700)]
Merge branch 'sja1105-spi'

Vladimir Oltean says:

====================
Adapt the sja1105 DSA driver to the SPI controller's transfer limits

This series changes the SPI transfer procedure in sja1105 to take into
consideration the buffer size limitations that the SPI controller driver
might have.

Changes in v2:
Remove the driver's use of cs_change and send multiple, smaller SPI
messages instead of a single large one.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: sja1105: adapt to a SPI controller with a limited max transfer size
Vladimir Oltean [Thu, 20 May 2021 21:16:57 +0000 (00:16 +0300)]
net: dsa: sja1105: adapt to a SPI controller with a limited max transfer size

The static config of the sja1105 switch is a long stream of bytes which
is programmed to the hardware in chunks (portions with the chip select
continuously asserted) of max 256 bytes each. Each chunk is a
spi_message composed of 2 spi_transfers: the buffer with the data and a
preceding buffer with the SPI access header.

Only that certain SPI controllers, such as the spi-sc18is602 I2C-to-SPI
bridge, cannot keep the chip select asserted for that long.
The spi_max_transfer_size() and spi_max_message_size() functions are how
the controller can impose its hardware limitations upon the SPI
peripheral driver.

For the sja1105 driver to work with these controllers, both buffers must
be smaller than the transfer limit, and their sum must be smaller than
the message limit.

Regression-tested on a switch connected to a controller with no
limitations (spi-fsl-dspi) as well as with one with caps for both
max_transfer_size and max_message_size (spi-sc18is602).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: sja1105: send multiple spi_messages instead of using cs_change
Vladimir Oltean [Thu, 20 May 2021 21:16:56 +0000 (00:16 +0300)]
net: dsa: sja1105: send multiple spi_messages instead of using cs_change

The sja1105 driver has been described by Mark Brown as "not using the
[ SPI ] API at all idiomatically" due to the use of cs_change:
https://patchwork.kernel.org/project/netdevbpf/patch/20210520135031.2969183-1-olteanv@gmail.com/

According to include/linux/spi/spi.h, the chip select is supposed to be
asserted for the entire length of a SPI message, as long as cs_change is
false for all member transfers. The cs_change flag changes the following:

(i) When a non-final SPI transfer has cs_change = true, the chip select
    should temporarily deassert and then reassert starting with the next
    transfer.
(ii) When a final SPI transfer has cs_change = true, the chip select
     should remain asserted until the following SPI message.

The sja1105 driver only uses cs_change for its first property, to form a
single SPI message whose layout can be seen below:

                                             this is an entire, single spi_message
           _______________________________________________________________________________________________
          /                                                                                               \
          +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
          | hdr_xfer[0] | chunk_xfer[0] | hdr_xfer[1] | chunk_xfer[1] |     | hdr_xfer[n] | chunk_xfer[n] |
          +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
cs_change      false          true           false           true                false          false

           ____________________________  _____________________________       _____________________________
CS line __/                            \/                             \ ... /                             \__

The fact of the matter is that spi_max_message_size() has an ambiguous
meaning if any non-final transfer has cs_change = true.

If the SPI master has a limitation in that it cannot keep the chip
select asserted for more than, say, 200 bytes (like the spi-sc18is602),
the normal thing for it to do is to implement .max_transfer_size and
.max_message_size, and limit both to 200: in the "worst case" where
cs_change is always false, then the controller can, indeed, not send
messages larger than 200 bytes.

But the fact that the SPI controller's max_message_size does not
necessarily mean that we cannot send messages larger than that.
Notably, if the SPI master special-cases the transfers with cs_change
and treats every chip select toggling as an entirely new transaction,
then a SPI message can easily exceed that limit. So there is a
temptation to ignore the controller's reported max_message_size when
using cs_change = true in non-final transfers.

But that can lead to false conclusions. As Mark points out, the SPI
controller might have a different kind of limitation with the max
message size, that has nothing at all to do with how long it can keep
the chip select asserted.
For example, that might be the case if the device is able to offload the
chip select changes to the hardware as part of the data stream, and it
packs the entire stream of commands+data (corresponding to a SPI
message) into a single DMA transfer that is itself limited in size.

So the only thing we can do is avoid ambiguity by not using cs_change at
all. Instead of sending a single spi_message, we now send multiple SPI
messages as follows:

                  spi_message 0                 spi_message 1                       spi_message n
           ____________________________   ___________________________        _____________________________
          /                            \ /                           \      /                             \
          +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
          | hdr_xfer[0] | chunk_xfer[0] | hdr_xfer[1] | chunk_xfer[1] |     | hdr_xfer[n] | chunk_xfer[n] |
          +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
cs_change      false          true           false           true                false          false

           ____________________________  _____________________________       _____________________________
CS line __/                            \/                             \ ... /                             \__

which is clearer because the max_message_size limit is now easier to
enforce. What is transmitted on the wire stays, of course, the same.

Additionally, because we send no more than 2 transfers at a time, we now
avoid dynamic memory allocation too, which might be seen as an
improvement by some.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: add driver for Motorcomm yt8511 phy
Peter Geis [Thu, 20 May 2021 16:32:30 +0000 (12:32 -0400)]
net: phy: add driver for Motorcomm yt8511 phy

Add a driver for the Motorcomm yt8511 phy that will be used in the
production Pine64 rk3566-quartz64 development board.
It supports gigabit transfer speeds, rgmii, and 125mhz clk output.

Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: xilinx_emaclite: Do not print real IOMEM pointer
YueHaibing [Wed, 19 May 2021 02:47:04 +0000 (10:47 +0800)]
net: xilinx_emaclite: Do not print real IOMEM pointer

Printing kernel pointers is discouraged because they might leak kernel
memory layout.  This fixes smatch warning:

drivers/net/ethernet/xilinx/xilinx_emaclite.c:1191 xemaclite_of_probe() warn:
 argument 4 to %08lX specifier is cast from pointer

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: cdc_ncm: use DEVICE_ATTR_RW macro
YueHaibing [Thu, 20 May 2021 13:46:19 +0000 (21:46 +0800)]
net: cdc_ncm: use DEVICE_ATTR_RW macro

Use DEVICE_ATTR_RW helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: usb: hso: use DEVICE_ATTR_RO macro
YueHaibing [Thu, 20 May 2021 13:41:16 +0000 (21:41 +0800)]
net: usb: hso: use DEVICE_ATTR_RO macro

Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: atm: use DEVICE_ATTR_RO macro
YueHaibing [Thu, 20 May 2021 13:36:45 +0000 (21:36 +0800)]
net: atm: use DEVICE_ATTR_RO macro

Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: net: devlink_port_split.py: skip the test if no devlink device
Po-Hsu Lin [Thu, 20 May 2021 10:49:54 +0000 (18:49 +0800)]
selftests: net: devlink_port_split.py: skip the test if no devlink device

When there is no devlink device, the following command will return:
  $ devlink -j dev show
  {dev:{}}

This will cause IndexError when trying to access the first element
in dev of this json dataset. Use the kselftest framework skip code
to skip this test in this case.

Example output with this change:
  # selftests: net: devlink_port_split.py
  # no devlink device was found, test skipped
  ok 7 selftests: net: devlink_port_split.py # SKIP

Link: https://bugs.launchpad.net/bugs/1928889
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: remove default label from to_string switch
Michal Suchanek [Thu, 20 May 2021 06:50:34 +0000 (08:50 +0200)]
ibmvnic: remove default label from to_string switch

This way the compiler warns when a new value is added to the enum but
not to the string translation like:

drivers/net/ethernet/ibm/ibmvnic.c: In function 'adapter_state_to_string':
drivers/net/ethernet/ibm/ibmvnic.c:832:2: warning: enumeration value 'VNIC_FOOBAR' not handled in switch [-Wswitch]
  switch (state) {
  ^~~~~~
drivers/net/ethernet/ibm/ibmvnic.c: In function 'reset_reason_to_string':
drivers/net/ethernet/ibm/ibmvnic.c:1935:2: warning: enumeration value 'VNIC_RESET_FOOBAR' not handled in switch [-Wswitch]
  switch (reason) {
  ^~~~~~

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Acked-by: Lijun Pan <lijunp213@gmail.com>
Link: https://lore.kernel.org/netdev/CAOhMmr701LecfuNM+EozqbiTxFvDiXjFdY2aYeKJYaXq9kqVDg@mail.gmail.com/
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoNFC: st21nfca: remove unnecessary variable and labels
wengjianfeng [Thu, 20 May 2021 01:05:50 +0000 (09:05 +0800)]
NFC: st21nfca: remove unnecessary variable and labels

assign vlue (EIO/EPROTO) to variable r, and goto exit label,
but just return r follow exit label, so we delete exit label,
and just replace with return sentence.

Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'bond-cleanups'
David S. Miller [Thu, 20 May 2021 22:43:25 +0000 (15:43 -0700)]
Merge branch 'bond-cleanups'

Guangbin Huang says:

====================
net: bonding: clean up some code style issues

This patchset cleans up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bonding: use tabs instead of space for code indent
Yufeng Mo [Thu, 20 May 2021 06:18:35 +0000 (14:18 +0800)]
net: bonding: use tabs instead of space for code indent

Code indent should use tabs where possible, so
use tabs instead of space for code indent.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bonding: remove unnecessary braces
Yufeng Mo [Thu, 20 May 2021 06:18:34 +0000 (14:18 +0800)]
net: bonding: remove unnecessary braces

Braces {} are not necessary for single statement blocks,
so remove these braces {}.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bonding: fix code indent for conditional statements
Yufeng Mo [Thu, 20 May 2021 06:18:33 +0000 (14:18 +0800)]
net: bonding: fix code indent for conditional statements

Fix incorrect code indent for conditional statements.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bonding: add some required blank lines
Yufeng Mo [Thu, 20 May 2021 06:18:32 +0000 (14:18 +0800)]
net: bonding: add some required blank lines

Add some blank lines after declarations as required.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
David S. Miller [Thu, 20 May 2021 22:13:28 +0000 (15:13 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-05-20

This series contains updates to igc driver only.

Andre Guedes says:

This series adds AF_XDP zero-copy feature to igc driver.

The initial patches do some code refactoring, preparing the code base to
land the AF_XDP zero-copy feature, avoiding code duplications. The last
patches of the series are the ones implementing the feature.

The last patch which indeed implements AF_XDP zero-copy support was
originally way too lengthy so, for the sake of code review, I broke it
up into two patches: one adding support for the RX functionality and the
other one adding TX support.
---
v2:
Patch 8/9 - "igc: Enable RX via AF_XDP zero-copy"
 * In XDP_PASS flow, copy metadata too into the skb.
 * When HW timestamp is added by the NIC, after copying it into
   a local variable, update xdp_buff->data_meta so that
   metadata length when XDP program is called 0.
 * In igc_xdp_enable_pool(), call xsk_pool_dma_unmap() on
   failure.

Known issues:
 When an XDP application is running in Tx-Only mode with Zero-Copy
 enabled, it is not expected to add the frames to the fill-queue. I have
 noticed the following two issues in this scenario:
 - If XDP_USE_NEED_WAKEUP flag is not set by application, igc_poll()
   will go into infinite loop because the buffer allocation resulting
   in igc_clean_rx_irq_zc() indicating that all work is not done and NAPI
   should keep polling. This does not occur if XDP_USE_NEED_WAKEUP
   flag is set.
 - Since there are no buffers allocated by userspace for the fill
   queue, there is no memory allocated for the NIC to copy the data
   to. If the packet received is destined to the hardware queue where
   XDP application is running, no packets are received even on other
   queues.
 Both these issues can be mitigated by adding a few frames to the
 fill queue. The second issue can also be mitigated by making sure no
 packets are being received on the hardware queue where Rx is running.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-leading-spaces'
David S. Miller [Thu, 20 May 2021 22:10:57 +0000 (15:10 -0700)]
Merge branch 'net-leading-spaces'

Hui Tang says:

====================
net: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

        $ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomii: remove leading spaces before tabs
Hui Tang [Thu, 20 May 2021 03:47:54 +0000 (11:47 +0800)]
mii: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

    $ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoifb: remove leading spaces before tabs
Hui Tang [Thu, 20 May 2021 03:47:53 +0000 (11:47 +0800)]
ifb: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

    $ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: appletalk: remove leading spaces before tabs
Hui Tang [Thu, 20 May 2021 03:47:52 +0000 (11:47 +0800)]
net: appletalk: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

    $ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: fddi: skfp: remove leading spaces before tabs
Hui Tang [Thu, 20 May 2021 03:47:51 +0000 (11:47 +0800)]
net: fddi: skfp: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

    $ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hamradio: remove leading spaces before tabs
Hui Tang [Thu, 20 May 2021 03:47:50 +0000 (11:47 +0800)]
net: hamradio: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

    $ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ppp: remove leading spaces before tabs
Hui Tang [Thu, 20 May 2021 03:47:49 +0000 (11:47 +0800)]
net: ppp: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

    $ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: slip: remove leading spaces before tabs
Hui Tang [Thu, 20 May 2021 03:47:48 +0000 (11:47 +0800)]
net: slip: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

    $ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: usb: remove leading spaces before tabs
Hui Tang [Thu, 20 May 2021 03:47:47 +0000 (11:47 +0800)]
net: usb: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

    $ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: remove leading spaces before tabs
Hui Tang [Thu, 20 May 2021 03:47:46 +0000 (11:47 +0800)]
net: wan: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Xie He <xie.he.0141@gmail.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'hns3-debugfs'
David S. Miller [Thu, 20 May 2021 22:01:04 +0000 (15:01 -0700)]
Merge branch 'hns3-debugfs'

Huazhong Tan says:

====================
net: hns3: refactor some debugfs commands

This series refactors the debugfs command to the new
process and removes the useless debugfs file node cmd
for the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: remove the useless debugfs file node cmd
Yufeng Mo [Thu, 20 May 2021 02:21:44 +0000 (10:21 +0800)]
net: hns3: remove the useless debugfs file node cmd

Currently, all debugfs commands have been reconstructed, and the
debugfs file node cmd is useless. So remove this debugfs file node.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump serv info of debugfs
Yufeng Mo [Thu, 20 May 2021 02:21:43 +0000 (10:21 +0800)]
net: hns3: refactor dump serv info of debugfs

Currently, the debugfs command for serv info is implemented by
"echo xxxx > cmd", and record the inforamtion in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"serv_info" for it, and query it by command "cat serv_info",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat service_task_info
local_clock: [  114.203321]
delta: 784(ms)
last_service_task_processed: 4294918512(jiffies)
last_service_task_cnt: 4

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump mac tnl status of debugfs
Jiaran Zhang [Thu, 20 May 2021 02:21:42 +0000 (10:21 +0800)]
net: hns3: refactor dump mac tnl status of debugfs

Currently, the debugfs command for dump mac tnl status is
implemented by "echo xxxx > cmd", and record the information
in dmesg. It's unnecessary and heavy. To improve it, create
a single file "mac_tnl_status" for it, and query it by command
"cat mac_tnl_status", return the result to userspace, rather
than record in dmesg.

The display style is below:
$ cat mac_tnl_status
Recently generated mac tnl interruption:
[0111204.175437] status = 0x30
[0154120.329912] status = 0x30

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump qs shaper of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:41 +0000 (10:21 +0800)]
net: hns3: refactor dump qs shaper of debugfs

Currently, user gets qset shaper parameters by implementing debugfs
command "echo dump qs shaper > cmd", this command will dump info in
dmesg. It's unnecessary and heavy.

As there is "tm_qset" file in tm directory for dump qset info, to
optimize these command, merge qset shaper parameters to tm_qset
file and use cat command to get them.

The display style is below:
$ cat tm_qset
ID    MAP_PRI  LINK_VLD  MODE  DWRR  IR_B  IR_U  IR_S  BS_B  BS_S  FLAG
0000     0        1      dwrr  100   150     7     0     5    20     0
0001     0        0        sp    0   150     7     0     5    20     0

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump qos buf cfg of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:40 +0000 (10:21 +0800)]
net: hns3: refactor dump qos buf cfg of debugfs

Currently, user gets qos buffer config by implementing debugfs command
"echo dump qos buf cfg > cmd", this command will dump info in dmesg.
It's unnecessary and heavy.

To optimize it, create a single file "qos_buf_cfg" in tm directory
and use cat command to get info. It will return info to userspace,
rather than record in dmesg.

The display style is below:
$ cat qos_buf_cfg
tx_packet_buf_tc_0: 0x120
tx_packet_buf_tc_1: 0x120
tx_packet_buf_tc_2: 0x120
tx_packet_buf_tc_3: 0x120
tx_packet_buf_tc_4: 0x0
tx_packet_buf_tc_5: 0x0
tx_packet_buf_tc_6: 0x0
tx_packet_buf_tc_7: 0x0
......

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump qos pri map of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:39 +0000 (10:21 +0800)]
net: hns3: refactor dump qos pri map of debugfs

Currently, user gets priority map by implementing debugfs command
"echo dump qos pri map > cmd", this command will dump info in dmesg.
It's unnecessary and heavy.

To optimize it, create a single file "qos_pri_map" in tm directory
and use cat command to get info. It will return info to userspace,
rather than record in dmesg.

The display style is below:
$ cat qos_pri_map
vlan_to_pri: 0
PRI    TC
0       0
1       1
2       2
3       3
4       0
5       1
6       2

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump qos pause cfg of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:38 +0000 (10:21 +0800)]
net: hns3: refactor dump qos pause cfg of debugfs

Currently, user gets pause config by implementing debugfs command
"echo dump qos pause cfg > cmd", this command will dump info in dmesg.
It's unnecessary and heavy.

To optimize it, create a single file "qos_pause_cfg" in tm directory
and use cat command to get info. It will return info to userspace,
rather than record in dmesg.

The display style is below:
$ cat qos_pause_cfg
pause_trans_gap: 0x7f
pause_trans_time: 0xffff

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump tc of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:37 +0000 (10:21 +0800)]
net: hns3: refactor dump tc of debugfs

Currently, user gets tc schedule info by implementing debugfs command
"echo dump tc > cmd", this command will dump info in dmesg. It's
unnecessary and heavy.

To optimize it, create a single file "tc_sch_info" and use cat command
to get info. It will return info to userspace, rather than record in
dmesg.

The display style is below:
$ cat tc_sch_info
enabled tc number: 4
weight_offset: 14
TC    MODE  WEIGHT
0     dwrr     25
1     dwrr     25
2     dwrr     25
3     dwrr     25
4     dwrr      0
5     dwrr      0
6     dwrr      0
7     dwrr      0

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump tm of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:36 +0000 (10:21 +0800)]
net: hns3: refactor dump tm of debugfs

Currently, user gets some tm info by implementing debugfs command
"echo dump tm > cmd", this command will dump info in dmesg. It's
unnecessary and heavy.

In addition, the info of this command mixes info of qset, priority,
pg and port. Qset and priority have their own command to get info of
themself, so can remove info of qset and priority from this command.

To optimize it, create two new files "tm_pg", "tm_port" in tm directory
and use cat command to separately get info of pg and port.

The display style is below:
$ cat tm_pg
ID  PRI_MAP  MODE DWRR  C_IR_B  C_IR_U  C_IR_S  C_BS_B  C_BS_S ...
00   0x1f    dwrr  1       75       9       0      31      20  ...

$ cat tm_port
IR_B  IR_U  IR_S  BS_B  BS_S  FLAG  RATE(Mbps)
75     9     0    31    20    1     200000

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump tm map of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:35 +0000 (10:21 +0800)]
net: hns3: refactor dump tm map of debugfs

Currently, the debugfs command for tm map is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"tm_map" for it, and query it by command "cat tm_map",
return the result to userspace, rather than record in dmesg.

As user can't specify queue id in cat command, driver will return info
of all queue id.

The display style is below:
$ cat tm_map
queue_id   qset_id   pri_id   tc_id
0000         0000      00       00
INDEX | TM BP QSET MAPPING:
0000  | 00000000:00000000:00000000:00000000:00000000:00000000:00000000
0256  | 00000000:00000000:00000000:00000000:00000000:00000002:00000000
0512  | 00000000:00000000:00000000:00000004:00000000:00000000:00000000
0768  | 00000000:00000008:00000000:00000000:00000000:00000000:00000000

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump fd tcam of debugfs
Hao Chen [Thu, 20 May 2021 02:21:34 +0000 (10:21 +0800)]
net: hns3: refactor dump fd tcam of debugfs

Currently, the debugfs command for fd tcam is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"fd_tcam" for it, and query it by command "cat fd_tcam",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat fd_tcam
read result tcam key x(31):
00000000
00000000
00000000
08000000
00000600
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
read result tcam key y(31):
00000000
00000000
00000000
f7ff0000
0000f900
00000000
00000000
00000000
00000000
00000000
00000000
00000000
0000fff8

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>