platform/kernel/linux-rpi.git
4 years agolibbpf: Fix realloc usage in bpf_core_find_cands
Andrii Nakryiko [Fri, 24 Jan 2020 20:18:46 +0000 (12:18 -0800)]
libbpf: Fix realloc usage in bpf_core_find_cands

Fix bug requesting invalid size of reallocated array when constructing CO-RE
relocation candidate list. This can cause problems if there are many potential
candidates and a very fine-grained memory allocator bucket sizes are used.

Fixes: ddc7c3042614 ("libbpf: implement BPF CO-RE offset relocation algorithm")
Reported-by: William Smith <williampsmith@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200124201847.212528-1-andriin@fb.com
4 years agolibbpf: Improve handling of failed CO-RE relocations
Andrii Nakryiko [Fri, 24 Jan 2020 05:38:37 +0000 (21:38 -0800)]
libbpf: Improve handling of failed CO-RE relocations

Previously, if libbpf failed to resolve CO-RE relocation for some
instructions, it would either return error immediately, or, if
.relaxed_core_relocs option was set, would replace relocatable offset/imm part
of an instruction with a bogus value (-1). Neither approach is good, because
there are many possible scenarios where relocation is expected to fail (e.g.,
when some field knowingly can be missing on specific kernel versions). On the
other hand, replacing offset with invalid one can hide programmer errors, if
this relocation failue wasn't anticipated.

This patch deprecates .relaxed_core_relocs option and changes the approach to
always replacing instruction, for which relocation failed, with invalid BPF
helper call instruction. For cases where this is expected, BPF program should
already ensure that that instruction is unreachable, in which case this
invalid instruction is going to be silently ignored. But if instruction wasn't
guarded, BPF program will be rejected at verification step with verifier log
pointing precisely to the place in assembly where the problem is.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200124053837.2434679-1-andriin@fb.com
4 years agoselftests: bpf: Reset global state between reuseport test runs
Lorenz Bauer [Fri, 24 Jan 2020 11:27:54 +0000 (11:27 +0000)]
selftests: bpf: Reset global state between reuseport test runs

Currently, there is a lot of false positives if a single reuseport test
fails. This is because expected_results and the result map are not cleared.

Zero both after individual test runs, which fixes the mentioned false
positives.

Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200124112754.19664-5-lmb@cloudflare.com
4 years agoselftests: bpf: Make reuseport test output more legible
Lorenz Bauer [Fri, 24 Jan 2020 11:27:53 +0000 (11:27 +0000)]
selftests: bpf: Make reuseport test output more legible

Include the name of the mismatching result in human readable format
when reporting an error. The new output looks like the following:

  unexpected result
   result: [1, 0, 0, 0, 0, 0]
  expected: [0, 0, 0, 0, 0, 0]
  mismatch on DROP_ERR_INNER_MAP (bpf_prog_linum:153)
  check_results:FAIL:382

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200124112754.19664-4-lmb@cloudflare.com
4 years agoselftests: bpf: Ignore FIN packets for reuseport tests
Lorenz Bauer [Fri, 24 Jan 2020 11:27:52 +0000 (11:27 +0000)]
selftests: bpf: Ignore FIN packets for reuseport tests

The reuseport tests currently suffer from a race condition: FIN
packets count towards DROP_ERR_SKB_DATA, since they don't contain
a valid struct cmd. Tests will spuriously fail depending on whether
check_results is called before or after the FIN is processed.

Exit the BPF program early if FIN is set.

Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200124112754.19664-3-lmb@cloudflare.com
4 years agoselftests: bpf: Use a temporary file in test_sockmap
Lorenz Bauer [Fri, 24 Jan 2020 11:27:51 +0000 (11:27 +0000)]
selftests: bpf: Use a temporary file in test_sockmap

Use a proper temporary file for sendpage tests. This means that running
the tests doesn't clutter the working directory, and allows running the
test on read-only filesystems.

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200124112754.19664-2-lmb@cloudflare.com
4 years agobpftool: Print function linkage in BTF dump
Andrii Nakryiko [Fri, 24 Jan 2020 05:43:17 +0000 (21:43 -0800)]
bpftool: Print function linkage in BTF dump

Add printing out BTF_KIND_FUNC's linkage.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200124054317.2459436-1-andriin@fb.com
4 years agoselftests/bpf: Improve bpftool changes detection
Andrii Nakryiko [Fri, 24 Jan 2020 05:41:48 +0000 (21:41 -0800)]
selftests/bpf: Improve bpftool changes detection

Detect when bpftool source code changes and trigger rebuild within
selftests/bpf Makefile. Also fix few small formatting problems.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200124054148.2455060-1-andriin@fb.com
4 years agoselftests/bpf: Initialize duration variable before using
John Sperbeck [Thu, 23 Jan 2020 23:51:44 +0000 (15:51 -0800)]
selftests/bpf: Initialize duration variable before using

The 'duration' variable is referenced in the CHECK() macro, and there are
some uses of the macro before 'duration' is set.  The clang compiler
(validly) complains about this.

Sample error:

.../selftests/bpf/prog_tests/fexit_test.c:23:6: warning: variable 'duration' is uninitialized when used here [-Wuninitialized]
        if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.../selftests/bpf/test_progs.h:134:25: note: expanded from macro 'CHECK'
        if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        _CHECK(condition, tag, duration, format)
                               ^~~~~~~~

Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200123235144.93610-1-sdf@google.com
4 years agobpf, devmap: Pass lockdep expression to RCU lists
Amol Grover [Thu, 23 Jan 2020 12:04:38 +0000 (17:34 +0530)]
bpf, devmap: Pass lockdep expression to RCU lists

head is traversed using hlist_for_each_entry_rcu outside an RCU
read-side critical section but under the protection of dtab->index_lock.

Hence, add corresponding lockdep expression to silence false-positive
lockdep warnings, and harden RCU lists.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200123120437.26506-1-frextrite@gmail.com
4 years agonet: convert suitable drivers to use phy_do_ioctl_running
Heiner Kallweit [Tue, 21 Jan 2020 21:09:33 +0000 (22:09 +0100)]
net: convert suitable drivers to use phy_do_ioctl_running

Convert suitable drivers to use new helper phy_do_ioctl_running.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Timur Tabi <timur@kernel.org>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Thu, 23 Jan 2020 07:10:16 +0000 (08:10 +0100)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-01-22

The following pull-request contains BPF updates for your *net-next* tree.

We've added 92 non-merge commits during the last 16 day(s) which contain
a total of 320 files changed, 7532 insertions(+), 1448 deletions(-).

The main changes are:

1) function by function verification and program extensions from Alexei.

2) massive cleanup of selftests/bpf from Toke and Andrii.

3) batched bpf map operations from Brian and Yonghong.

4) tcp congestion control in bpf from Martin.

5) bulking for non-map xdp_redirect form Toke.

6) bpf_send_signal_thread helper from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'bpf_cubic'
Alexei Starovoitov [Thu, 23 Jan 2020 00:30:11 +0000 (16:30 -0800)]
Merge branch 'bpf_cubic'

Martin KaFai Lau says:

====================
This set adds bpf_cubic.c example.  It was separated from the
earlier BPF STRUCT_OPS series.  Some highlights since the
last post:

1. It is based on EricD recent fixes to the kernel tcp_cubic. [1]
2. The bpf jiffies reading helper is inlined by the verifier.
   Different from the earlier version, it only reads jiffies alone
   and does not do usecs/jiffies conversion.
3. The bpf .kconfig map is used to read CONFIG_HZ.

[1]: https://patchwork.ozlabs.org/cover/1215066/

v3:
- Remove __weak from CONFIG_HZ in patch 3. (Andrii)

v2:
- Move inlining to fixup_bpf_calls() in patch 1. (Daniel)
- It is inlined for 64 BITS_PER_LONG and jit_requested
  as the map_gen_lookup().  Other cases could be
  considered together with map_gen_lookup() if needed.
- Use usec resolution in bictcp_update() calculation in patch 3.
  usecs_to_jiffies() is then removed().  (Eric)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 years agobpf: tcp: Add bpf_cubic example
Martin KaFai Lau [Wed, 22 Jan 2020 23:36:58 +0000 (15:36 -0800)]
bpf: tcp: Add bpf_cubic example

This patch adds a bpf_cubic example.  Some highlights:
1. CONFIG_HZ .kconfig map is used.
2. In bictcp_update(), calculation is changed to use usec
   resolution (i.e. USEC_PER_JIFFY) instead of using jiffies.
   Thus, usecs_to_jiffies() is not used in the bpf_cubic.c.
3. In bitctcp_update() [under tcp_friendliness], the original
   "while (ca->ack_cnt > delta)" loop is changed to the equivalent
   "ca->ack_cnt / delta" operation.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200122233658.903774-1-kafai@fb.com
4 years agobpf: Sync uapi bpf.h to tools/
Martin KaFai Lau [Wed, 22 Jan 2020 23:36:52 +0000 (15:36 -0800)]
bpf: Sync uapi bpf.h to tools/

This patch sync uapi bpf.h to tools/.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200122233652.903348-1-kafai@fb.com
4 years agobpf: Add BPF_FUNC_jiffies64
Martin KaFai Lau [Wed, 22 Jan 2020 23:36:46 +0000 (15:36 -0800)]
bpf: Add BPF_FUNC_jiffies64

This patch adds a helper to read the 64bit jiffies.  It will be used
in a later patch to implement the bpf_cubic.c.

The helper is inlined for jit_requested and 64 BITS_PER_LONG
as the map_gen_lookup().  Other cases could be considered together
with map_gen_lookup() if needed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com
4 years agoMerge branch 'bpf-dynamic-relinking'
Daniel Borkmann [Wed, 22 Jan 2020 22:04:53 +0000 (23:04 +0100)]
Merge branch 'bpf-dynamic-relinking'

Alexei Starovoitov says:

====================
The last few month BPF community has been discussing an approach to call
chaining, since exiting bpt_tail_call() mechanism used in production XDP
programs has plenty of downsides. The outcome of these discussion was a
conclusion to implement dynamic re-linking of BPF programs. Where rootlet XDP
program attached to a netdevice can programmatically define a policy of
execution of other XDP programs. Such rootlet would be compiled as normal XDP
program and provide a number of placeholder global functions which later can be
replaced with future XDP programs. BPF trampoline, function by function
verification were building blocks towards that goal. The patch 1 is a final
building block. It introduces dynamic program extensions. A number of
improvements like more flexible function by function verification and better
libbpf api will be implemented in future patches.

v1->v2:
- addressed Andrii's comments
- rebase
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
4 years agoselftests/bpf: Add tests for program extensions
Alexei Starovoitov [Tue, 21 Jan 2020 00:53:48 +0000 (16:53 -0800)]
selftests/bpf: Add tests for program extensions

Add program extension tests that build on top of fexit_bpf2bpf tests.
Replace three global functions in previously loaded test_pkt_access.c program
with three new implementations:
int get_skb_len(struct __sk_buff *skb);
int get_constant(long val);
int get_skb_ifindex(int val, struct __sk_buff *skb, int var);
New function return the same results as original only if arguments match.

new_get_skb_ifindex() demonstrates that 'skb' argument doesn't have to be first
and only argument of BPF program. All normal skb based accesses are available.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200121005348.2769920-4-ast@kernel.org
4 years agolibbpf: Add support for program extensions
Alexei Starovoitov [Tue, 21 Jan 2020 00:53:47 +0000 (16:53 -0800)]
libbpf: Add support for program extensions

Add minimal support for program extensions. bpf_object_open_opts() needs to be
called with attach_prog_fd = target_prog_fd and BPF program extension needs to
have in .c file section definition like SEC("freplace/func_to_be_replaced").
libbpf will search for "func_to_be_replaced" in the target_prog_fd's BTF and
will pass it in attach_btf_id to the kernel. This approach works for tests, but
more compex use case may need to request function name (and attach_btf_id that
kernel sees) to be more dynamic. Such API will be added in future patches.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200121005348.2769920-3-ast@kernel.org
4 years agobpf: Introduce dynamic program extensions
Alexei Starovoitov [Tue, 21 Jan 2020 00:53:46 +0000 (16:53 -0800)]
bpf: Introduce dynamic program extensions

Introduce dynamic program extensions. The users can load additional BPF
functions and replace global functions in previously loaded BPF programs while
these programs are executing.

Global functions are verified individually by the verifier based on their types only.
Hence the global function in the new program which types match older function can
safely replace that corresponding function.

This new function/program is called 'an extension' of old program. At load time
the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function
to be replaced. The BPF program type is derived from the target program into
extension program. Technically bpf_verifier_ops is copied from target program.
The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops.
The extension program can call the same bpf helper functions as target program.
Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program
types. The verifier allows only one level of replacement. Meaning that the
extension program cannot recursively extend an extension. That also means that
the maximum stack size is increasing from 512 to 1024 bytes and maximum
function nesting level from 8 to 16. The programs don't always consume that
much. The stack usage is determined by the number of on-stack variables used by
the program. The verifier could have enforced 512 limit for combined original
plus extension program, but it makes for difficult user experience. The main
use case for extensions is to provide generic mechanism to plug external
programs into policy program or function call chaining.

BPF trampoline is used to track both fentry/fexit and program extensions
because both are using the same nop slot at the beginning of every BPF
function. Attaching fentry/fexit to a function that was replaced is not
allowed. The opposite is true as well. Replacing a function that currently
being analyzed with fentry/fexit is not allowed. The executable page allocated
by BPF trampoline is not used by program extensions. This inefficiency will be
optimized in future patches.

Function by function verification of global function supports scalars and
pointer to context only. Hence program extensions are supported for such class
of global functions only. In the future the verifier will be extended with
support to pointers to structures, arrays with sizes, etc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200121005348.2769920-2-ast@kernel.org
4 years agonet: convert additional drivers to use phy_do_ioctl
Heiner Kallweit [Tue, 21 Jan 2020 21:05:14 +0000 (22:05 +0100)]
net: convert additional drivers to use phy_do_ioctl

The first batch of driver conversions missed a few cases where we can
use phy_do_ioctl too.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobpf, btf: Always output invariant hit in pahole DWARF to BTF transform
Chris Down [Wed, 22 Jan 2020 00:01:10 +0000 (00:01 +0000)]
bpf, btf: Always output invariant hit in pahole DWARF to BTF transform

When trying to compile with CONFIG_DEBUG_INFO_BTF enabled, I got this
error:

    % make -s
    Failed to generate BTF for vmlinux
    Try to disable CONFIG_DEBUG_INFO_BTF
    make[3]: *** [vmlinux] Error 1

Compiling again without -s shows the true error (that pahole is
missing), but since this is fatal, we should show the error
unconditionally on stderr as well, not silence it using the `info`
function. With this patch:

    % make -s
    BTF: .tmp_vmlinux.btf: pahole (pahole) is not available
    Failed to generate BTF for vmlinux
    Try to disable CONFIG_DEBUG_INFO_BTF
    make[3]: *** [vmlinux] Error 1

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200122000110.GA310073@chrisdown.name
4 years agoselftests/bpf: Build urandom_read with LDFLAGS and LDLIBS
Daniel Díaz [Wed, 22 Jan 2020 16:44:24 +0000 (17:44 +0100)]
selftests/bpf: Build urandom_read with LDFLAGS and LDLIBS

During cross-compilation, it was discovered that LDFLAGS and
LDLIBS were not being used while building binaries, leading
to defaults which were not necessarily correct.

OpenEmbedded reported this kind of problem:

  ERROR: QA Issue: No GNU_HASH in the ELF binary [...], didn't pass LDFLAGS?

Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
4 years agobpf: Fix error path under memory pressure
Alexei Starovoitov [Wed, 22 Jan 2020 02:41:38 +0000 (18:41 -0800)]
bpf: Fix error path under memory pressure

Restore the 'if (env->cur_state)' check that was incorrectly removed during
code move. Under memory pressure env->cur_state can be freed and zeroed inside
do_check(). Hence the check is necessary.

Fixes: 51c39bb1d5d1 ("bpf: Introduce function-by-function verification")
Reported-by: syzbot+b296579ba5015704d9fa@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200122024138.3385590-1-ast@kernel.org
4 years agobpf: Fix trampoline usage in preempt
Alexei Starovoitov [Tue, 21 Jan 2020 03:22:31 +0000 (19:22 -0800)]
bpf: Fix trampoline usage in preempt

Though the second half of trampoline page is unused a task could be
preempted in the middle of the first half of trampoline and two
updates to trampoline would change the code from underneath the
preempted task. Hence wait for tasks to voluntarily schedule or go
to userspace. Add similar wait before freeing the trampoline.

Fixes: fec56f5890d9 ("bpf: Introduce BPF trampoline")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/bpf/20200121032231.3292185-1-ast@kernel.org
4 years agoxsk, net: Make sock_def_readable() have external linkage
Björn Töpel [Mon, 20 Jan 2020 09:29:17 +0000 (10:29 +0100)]
xsk, net: Make sock_def_readable() have external linkage

XDP sockets use the default implementation of struct sock's
sk_data_ready callback, which is sock_def_readable(). This function
is called in the XDP socket fast-path, and involves a retpoline. By
letting sock_def_readable() have external linkage, and being called
directly, the retpoline can be avoided.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200120092917.13949-1-bjorn.topel@gmail.com
4 years agobpf: don't bother with getname/kern_path - use user_path_at
Al Viro [Mon, 20 Jan 2020 23:28:58 +0000 (23:28 +0000)]
bpf: don't bother with getname/kern_path - use user_path_at

kernel/bpf/inode.c misuses kern_path...() - it's much simpler (and
more efficient, on top of that) to use user_path...() counterparts
rather than bothering with doing getname() manually.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200120232858.GF8904@ZenIV.linux.org.uk
4 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec...
David S. Miller [Tue, 21 Jan 2020 11:18:20 +0000 (12:18 +0100)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2020-01-21

1) Add support for TCP encapsulation of IKE and ESP messages,
   as defined by RFC 8229. Patchset from Sabrina Dubroca.

Please note that there is a merge conflict in:

net/unix/af_unix.c

between commit:

3c32da19a858 ("unix: Show number of pending scm files of receive queue in fdinfo")

from the net-next tree and commit:

b50b0580d27b ("net: add queue argument to __skb_wait_for_more_packets and __skb_{,try_}recv_datagram")

from the ipsec-next tree.

The conflict can be solved as done in linux-next.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodrivers: net: declance: fix comparing pointer to 0
Chen Zhou [Tue, 21 Jan 2020 09:24:55 +0000 (17:24 +0800)]
drivers: net: declance: fix comparing pointer to 0

Fixes coccicheck warning:

./drivers/net/ethernet/amd/declance.c:611:14-15:
WARNING comparing pointer to 0

Replace "skb == 0" with "!skb".

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp/ipv4: remove AF_INET_FAMILY
Alex Shi [Tue, 21 Jan 2020 08:50:07 +0000 (16:50 +0800)]
tcp/ipv4: remove AF_INET_FAMILY

After commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
the macro isn't used anymore. remove it.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/hsr: remove seq_nr_after_or_eq
Alex Shi [Tue, 21 Jan 2020 08:49:53 +0000 (16:49 +0800)]
net/hsr: remove seq_nr_after_or_eq

It's never used after introduced. So maybe better to remove.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agohdlx_x25: Fix backwards compat test.
David S. Miller [Tue, 21 Jan 2020 11:02:25 +0000 (12:02 +0100)]
hdlx_x25: Fix backwards compat test.

drivers/net/wan/hdlc_x25.c: In function ‘x25_ioctl’:
drivers/net/wan/hdlc_x25.c:256:7: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
  256 |   if (ifr->ifr_settings.size = 0) {
      |       ^~~

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'hns3-next'
David S. Miller [Tue, 21 Jan 2020 10:46:21 +0000 (11:46 +0100)]
Merge branch 'hns3-next'

Huazhong Tan says:

====================
net: hns3: misc updates for -net-next

This series includes some misc updates for the HNS3 ethernet driver.

[patch 1] adds a limitation for the error log in the
hns3_clean_tx_ring().
[patch 2] adds a check for pfmemalloc flag before reusing pages
since these pages may be used some special case.
[patch 3] assigns a default reset type 'HNAE3_NONE_RESET' to
VF's reset_type after initializing or reset.
[patch 4] unifies macro HCLGE_DFX_REG_TYPE_CNT's definition into
header file.
[patch 5] refines the parameter 'size' of snprintf() in the
hns3_init_module().
[patch 6] rewrites a debug message in hclge_put_vector().
[patch 7~9] adds some cleanups related to coding style.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: cleanup some coding style issue
Huazhong Tan [Tue, 21 Jan 2020 08:42:13 +0000 (16:42 +0800)]
net: hns3: cleanup some coding style issue

This patch removes some unnecessary return value assignments,
some duplicated printing in the caller, refines the judgment
of 0 and uses le16_to_cpu to replace __le16_to_cpu.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: remove redundant print on ENOMEM
Huazhong Tan [Tue, 21 Jan 2020 08:42:12 +0000 (16:42 +0800)]
net: hns3: remove redundant print on ENOMEM

All kmalloc-based functions print enough information on failures.
So this patch removes the log in hclge_get_dfx_reg() when returns
ENOMEM.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: delete unnecessary blank line and space for cleanup
Guangbin Huang [Tue, 21 Jan 2020 08:42:11 +0000 (16:42 +0800)]
net: hns3: delete unnecessary blank line and space for cleanup

This patch deletes some unnecessary blank lines and spaces to clean up
code, and in hclgevf_set_vlan_filter() moves the comment to the front
of hclgevf_send_mbx_msg().

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: rewrite a log in hclge_put_vector()
Yonglong Liu [Tue, 21 Jan 2020 08:42:10 +0000 (16:42 +0800)]
net: hns3: rewrite a log in hclge_put_vector()

When gets vector fails, hclge_put_vector() should print out
the vector instead of vector_id in the log and return the wrong
vector_id to its caller.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: refine the input parameter 'size' for snprintf()
Guojia Liao [Tue, 21 Jan 2020 08:42:09 +0000 (16:42 +0800)]
net: hns3: refine the input parameter 'size' for snprintf()

The function snprintf() writes at most size bytes (including the
terminating null byte ('\0') to str. Now, We can guarantee that the
parameter of size is lager than the length of str to be formatting
including its terminating null byte. So it's unnecessary to minus 1
for the input parameter 'size'.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: move duplicated macro definition into header
Guojia Liao [Tue, 21 Jan 2020 08:42:08 +0000 (16:42 +0800)]
net: hns3: move duplicated macro definition into header

Macro HCLGE_GET_DFX_REG_TYPE_CNT in hclge_dbg_get_dfx_bd_num()
and macro HCLGE_DFX_REG_BD_NUM in hclge_get_dfx_reg_bd_num()
have the same meaning, so just defines HCLGE_GET_DFX_REG_TYPE_CNT
in hclge_main.h.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: set VF's default reset_type to HNAE3_NONE_RESET
Huazhong Tan [Tue, 21 Jan 2020 08:42:07 +0000 (16:42 +0800)]
net: hns3: set VF's default reset_type to HNAE3_NONE_RESET

reset_type means what kind of reset the driver is handling now,
so after initializing or reset, the reset_type of VF should be
set to HNAE3_NONE_RESET, otherwise, this unknown default value
may be a little misleading when the device is running.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: do not reuse pfmemalloc pages
Yunsheng Lin [Tue, 21 Jan 2020 08:42:06 +0000 (16:42 +0800)]
net: hns3: do not reuse pfmemalloc pages

HNS3 driver allocates pages for DMA with dev_alloc_pages(), which
calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
of OOM condition, HNS3 can get pages with pfmemalloc flag set.

So do not reuse the pages with pfmemalloc flag set because those
pages are reserved for special cases, such as low memory case.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: limit the error logging in the hns3_clean_tx_ring()
Yunsheng Lin [Tue, 21 Jan 2020 08:42:05 +0000 (16:42 +0800)]
net: hns3: limit the error logging in the hns3_clean_tx_ring()

The error log printed by netdev_err() in the hns3_clean_tx_ring()
may spam the kernel log.

This patch uses hns3_rl_err() to ratelimit the error log in the
hns3_clean_tx_ring().

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agowan/hdlc_x25: fix skb handling
Martin Schiller [Tue, 21 Jan 2020 06:00:34 +0000 (07:00 +0100)]
wan/hdlc_x25: fix skb handling

o call skb_reset_network_header() before hdlc->xmit()
 o change skb proto to HDLC (0x0019) before hdlc->xmit()
 o call dev_queue_xmit_nit() before hdlc->xmit()

This changes make it possible to trace (tcpdump) outgoing layer2
(ETH_P_HDLC) packets

Additionally call skb_reset_network_header() after each skb_push() /
skb_pull().

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agowan/hdlc_x25: make lapb params configurable
Martin Schiller [Tue, 21 Jan 2020 06:00:33 +0000 (07:00 +0100)]
wan/hdlc_x25: make lapb params configurable

This enables you to configure mode (DTE/DCE), Modulo, Window, T1, T2, N2 via
sethdlc (which needs to be patched as well).

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: allow unprivileged users to read pnet table
Hans Wippel [Tue, 21 Jan 2020 00:04:46 +0000 (01:04 +0100)]
net/smc: allow unprivileged users to read pnet table

The current flags of the SMC_PNET_GET command only allow privileged
users to retrieve entries from the pnet table via netlink. The content
of the pnet table may be useful for all users though, e.g., for
debugging smc connection problems.

This patch removes the GENL_ADMIN_PERM flag so that unprivileged users
can read the pnet table.

Signed-off-by: Hans Wippel <ndev@hwipl.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'phy-add-new-version-of-phy_do_ioctl-and-convert-suitable-drivers'
David S. Miller [Tue, 21 Jan 2020 09:50:41 +0000 (10:50 +0100)]
Merge branch 'phy-add-new-version-of-phy_do_ioctl-and-convert-suitable-drivers'

Heiner Kallweit says:

====================
net: phy: add new version of phy_do_ioctl and convert suitable drivers

We just added phy_do_ioctl, but it turned out that we need another
version of this function that doesn't check whether net_device is
running. So rename phy_do_ioctl to phy_do_ioctl_running and add a
new version of phy_do_ioctl. Eventually convert suitable drivers
to use phy_do_ioctl.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: convert suitable network drivers to use phy_do_ioctl
Heiner Kallweit [Mon, 20 Jan 2020 21:18:37 +0000 (22:18 +0100)]
net: convert suitable network drivers to use phy_do_ioctl

Convert suitable network drivers to use phy_do_ioctl.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: add new version of phy_do_ioctl
Heiner Kallweit [Mon, 20 Jan 2020 21:17:11 +0000 (22:17 +0100)]
net: phy: add new version of phy_do_ioctl

Add a new version of phy_do_ioctl that doesn't check whether net_device
is running. It will typically be used if suitable drivers attach the
PHY in probe already.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: rename phy_do_ioctl to phy_do_ioctl_running
Heiner Kallweit [Mon, 20 Jan 2020 21:16:07 +0000 (22:16 +0100)]
net: phy: rename phy_do_ioctl to phy_do_ioctl_running

We just added phy_do_ioctl, but it turned out that we need another
version of this function that doesn't check whether net_device is
running. So rename phy_do_ioctl to phy_do_ioctl_running.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: replace snprintf with scnprintf in hns3_update_strings
Chen Zhou [Mon, 20 Jan 2020 12:50:33 +0000 (20:50 +0800)]
net: hns3: replace snprintf with scnprintf in hns3_update_strings

snprintf returns the number of bytes that would be written, which may be
greater than the the actual length to be written. Here use extra code to
handle this.

scnprintf returns the number of bytes that was actually written, just use
scnprintf to simplify the code.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: replace snprintf with scnprintf in hns3_dbg_cmd_read
Chen Zhou [Mon, 20 Jan 2020 12:49:43 +0000 (20:49 +0800)]
net: hns3: replace snprintf with scnprintf in hns3_dbg_cmd_read

The return value of snprintf may be greater than the size of
HNS3_DBG_READ_LEN, use scnprintf instead in hns3_dbg_cmd_read.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'rds-odp-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/leon...
David S. Miller [Tue, 21 Jan 2020 09:22:51 +0000 (10:22 +0100)]
Merge tag 'rds-odp-for-5.5' of https://git./linux/kernel/git/leon/linux-rdma

Leon Romanovsky says:

====================
Use ODP MRs for kernel ULPs

The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.

The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.

The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.

The third part is actual user of those in-kernel APIs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'libbpf-include-path'
Alexei Starovoitov [Tue, 21 Jan 2020 00:37:46 +0000 (16:37 -0800)]
Merge branch 'libbpf-include-path'

Toke Høiland-Jørgensen says:

====================
We are currently being somewhat inconsistent with the libbpf include paths,
which makes it difficult to move files from the kernel into an external
libbpf-using project without adjusting include paths.

Having the bpf/ subdir of $INCLUDEDIR in the include path has never been a
requirement for building against libbpf before, and indeed the libbpf pkg-config
file doesn't include it. So let's make all libbpf includes across the kernel
tree use the bpf/ prefix in their includes. Since bpftool skeleton generation
emits code with a libbpf include, this also ensures that those can be used in
existing external projects using the regular pkg-config include path.

This turns out to be a somewhat invasive change in the number of files touched;
however, the actual changes to files are fairly trivial (most of them are simply
made with 'sed'). The series is split to make the change for one tool subdir at
a time, while trying not to break the build along the way. It is structured like
this:

- Patch 1-3: Trivial fixes to Makefiles for issues I discovered while changing
  the include paths.

- Patch 4-8: Change the include directives to use the bpf/ prefix, and updates
  Makefiles to make sure tools/lib/ is part of the include path, but without
  removing tools/lib/bpf

- Patch 9-11: Remove tools/lib/bpf from include paths to make sure we don't
  inadvertently re-introduce includes without the bpf/ prefix.

Changelog:

v5:
  - Combine the libbpf build rules in selftests Makefile (using Andrii's
    suggestion for a make rule).
  - Re-use self-tests libbpf build for runqslower (new patch 10)
  - Formatting fixes

v4:
  - Move runqslower error on missing BTF into make rule
  - Make sure we don't always force a rebuild selftests
  - Rebase on latest bpf-next (dropping patch 11)

v3:
  - Don't add the kernel build dir to the runqslower Makefile, pass it in from
    selftests instead.
  - Use libbpf's 'make install_headers' in selftests instead of trying to
    generate bpf_helper_defs.h in-place (to also work on read-only filesystems).
  - Use a scratch builddir for both libbpf and bpftool when building in selftests.
  - Revert bpf_helpers.h to quoted include instead of angled include with a bpf/
    prefix.
  - Fix a few style nits from Andrii

v2:
  - Do a full cleanup of libbpf includes instead of just changing the
    bpf_helper_defs.h include.
====================

Acked-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 years agoselftests: Refactor build to remove tools/lib/bpf from include path
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:52 +0000 (14:06 +0100)]
selftests: Refactor build to remove tools/lib/bpf from include path

To make sure no new files are introduced that doesn't include the bpf/
prefix in its #include, remove tools/lib/bpf from the include path
entirely.

Instead, we introduce a new header files directory under the scratch tools/
dir, and add a rule to run the 'install_headers' rule from libbpf to have a
full set of consistent libbpf headers in $(OUTPUT)/tools/include/bpf, and
then use $(OUTPUT)/tools/include as the include path for selftests.

For consistency we also make sure we put all the scratch build files from
other bpftool and libbpf into tools/build/, so everything stays within
selftests/.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/157952561246.1683545.2762245552022369203.stgit@toke.dk
4 years agorunsqslower: Support user-specified libbpf include and object paths
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:51 +0000 (14:06 +0100)]
runsqslower: Support user-specified libbpf include and object paths

This adds support for specifying the libbpf include and object paths as
arguments to the runqslower Makefile, to support reusing the libbpf version
built as part of the selftests.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/157952561135.1683545.5660339645093141381.stgit@toke.dk
4 years agotools/runqslower: Remove tools/lib/bpf from include path
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:50 +0000 (14:06 +0100)]
tools/runqslower: Remove tools/lib/bpf from include path

Since we are now consistently using the bpf/ prefix on #include directives,
we don't need to include tools/lib/bpf in the include path. Remove it to
make sure we don't inadvertently introduce new includes without the prefix.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157952561027.1683545.1976265477926794138.stgit@toke.dk
4 years agosamples/bpf: Use consistent include paths for libbpf
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:49 +0000 (14:06 +0100)]
samples/bpf: Use consistent include paths for libbpf

Fix all files in samples/bpf to include libbpf header files with the bpf/
prefix, to be consistent with external users of the library. Also ensure
that all includes of exported libbpf header files (those that are exported
on 'make install' of the library) use bracketed includes instead of quoted.

To make sure no new files are introduced that doesn't include the bpf/
prefix in its include, remove tools/lib/bpf from the include path entirely,
and use tools/lib instead.

Fixes: 6910d7d3867a ("selftests/bpf: Ensure bpf_helper_defs.h are taken from selftests dir")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157952560911.1683545.8795966751309534150.stgit@toke.dk
4 years agoperf: Use consistent include paths for libbpf
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:48 +0000 (14:06 +0100)]
perf: Use consistent include paths for libbpf

Fix perf to include libbpf header files with the bpf/ prefix, to
be consistent with external users of the library.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157952560797.1683545.7685921032671386301.stgit@toke.dk
4 years agobpftool: Use consistent include paths for libbpf
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:46 +0000 (14:06 +0100)]
bpftool: Use consistent include paths for libbpf

Fix bpftool to include libbpf header files with the bpf/ prefix, to be
consistent with external users of the library. Also ensure that all
includes of exported libbpf header files (those that are exported on 'make
install' of the library) use bracketed includes instead of quoted.

To make sure no new files are introduced that doesn't include the bpf/
prefix in its include, remove tools/lib/bpf from the include path entirely,
and use tools/lib instead.

Fixes: 6910d7d3867a ("selftests/bpf: Ensure bpf_helper_defs.h are taken from selftests dir")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157952560684.1683545.4765181397974997027.stgit@toke.dk
4 years agoselftests: Use consistent include paths for libbpf
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:45 +0000 (14:06 +0100)]
selftests: Use consistent include paths for libbpf

Fix all selftests to include libbpf header files with the bpf/ prefix, to
be consistent with external users of the library. Also ensure that all
includes of exported libbpf header files (those that are exported on 'make
install' of the library) use bracketed includes instead of quoted.

To not break the build, keep the old include path until everything has been
changed to the new one; a subsequent patch will remove that.

Fixes: 6910d7d3867a ("selftests/bpf: Ensure bpf_helper_defs.h are taken from selftests dir")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157952560568.1683545.9649335788846513446.stgit@toke.dk
4 years agotools/runqslower: Use consistent include paths for libbpf
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:44 +0000 (14:06 +0100)]
tools/runqslower: Use consistent include paths for libbpf

Fix the runqslower tool to include libbpf header files with the bpf/
prefix, to be consistent with external users of the library. Also ensure
that all includes of exported libbpf header files (those that are exported
on 'make install' of the library) use bracketed includes instead of quoted.

To not break the build, keep the old include path until everything has been
changed to the new one; a subsequent patch will remove that.

Fixes: 6910d7d3867a ("selftests/bpf: Ensure bpf_helper_defs.h are taken from selftests dir")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157952560457.1683545.9913736511685743625.stgit@toke.dk
4 years agoselftests: Pass VMLINUX_BTF to runqslower Makefile
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:43 +0000 (14:06 +0100)]
selftests: Pass VMLINUX_BTF to runqslower Makefile

Add a VMLINUX_BTF variable with the locally-built path when calling the
runqslower Makefile from selftests. This makes sure a simple 'make'
invocation in the selftests dir works even when there is no BTF information
for the running kernel. Do a wildcard expansion and include the same paths
for BTF for the running kernel as in the runqslower Makefile, to make it
possible to build selftests without having a vmlinux in the local tree.

Also fix the make invocation to use $(OUTPUT)/tools as the destination
directory instead of $(CURDIR)/tools.

Fixes: 3a0d3092a4ed ("selftests/bpf: Build runqslower from selftests")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157952560344.1683545.2723631988771664417.stgit@toke.dk
4 years agotools/bpf/runqslower: Fix override option for VMLINUX_BTF
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:42 +0000 (14:06 +0100)]
tools/bpf/runqslower: Fix override option for VMLINUX_BTF

The runqslower tool refuses to build without a file to read vmlinux BTF
from. The build fails with an error message to override the location by
setting the VMLINUX_BTF variable if autodetection fails. However, the
Makefile doesn't actually work with that override - the error message is
still emitted.

Fix this by including the value of VMLINUX_BTF in the expansion, and only
emitting the error message if the *result* is empty. Also permit running
'make clean' even though no VMLINUX_BTF is set.

Fixes: 9c01546d26d2 ("tools/bpf: Add runqslower tool to tools/bpf")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157952560237.1683545.17771785178857224877.stgit@toke.dk
4 years agosamples/bpf: Don't try to remove user's homedir on clean
Toke Høiland-Jørgensen [Mon, 20 Jan 2020 13:06:41 +0000 (14:06 +0100)]
samples/bpf: Don't try to remove user's homedir on clean

The 'clean' rule in the samples/bpf Makefile tries to remove backup
files (ending in ~). However, if no such files exist, it will instead try
to remove the user's home directory. While the attempt is mostly harmless,
it does lead to a somewhat scary warning like this:

rm: cannot remove '~': Is a directory

Fix this by using find instead of shell expansion to locate any actual
backup files that need to be removed.

Fixes: b62a796c109c ("samples/bpf: allow make to be run from samples/bpf/ directory")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/157952560126.1683545.7273054725976032511.stgit@toke.dk
4 years agoselftests/bpf: Skip perf hw events test if the setup disabled it
Hangbin Liu [Fri, 17 Jan 2020 10:06:56 +0000 (18:06 +0800)]
selftests/bpf: Skip perf hw events test if the setup disabled it

The same with commit 4e59afbbed96 ("selftests/bpf: skip nmi test when perf
hw events are disabled"), it would make more sense to skip the
test_stacktrace_build_id_nmi test if the setup (e.g. virtual machines) has
disabled hardware perf events.

Fixes: 13790d1cc72c ("bpf: add selftest for stackmap with build_id in NMI context")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200117100656.10359-1-liuhangbin@gmail.com
4 years agoselftests/bpf: Don't check for btf fd in test_btf
Stanislav Fomichev [Sat, 18 Jan 2020 01:05:46 +0000 (17:05 -0800)]
selftests/bpf: Don't check for btf fd in test_btf

After commit 0d13bfce023a ("libbpf: Don't require root for
bpf_object__open()") we no longer load BTF during bpf_object__open(),
so let's remove the expectation from test_btf that the fd is not -1.
The test currently fails.

Before:
BTF libbpf test[1] (test_btf_haskv.o): do_test_file:4152:FAIL bpf_object__btf_fd: -1
BTF libbpf test[2] (test_btf_newkv.o): do_test_file:4152:FAIL bpf_object__btf_fd: -1
BTF libbpf test[3] (test_btf_nokv.o): do_test_file:4152:FAIL bpf_object__btf_fd: -1

After:
BTF libbpf test[1] (test_btf_haskv.o): OK
BTF libbpf test[2] (test_btf_newkv.o): OK
BTF libbpf test[3] (test_btf_nokv.o): OK

Fixes: 0d13bfce023a ("libbpf: Don't require root for bpf_object__open()")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200118010546.74279-1-sdf@google.com
4 years agobpf: Fix memory leaks in generic update/delete batch ops
Brian Vazquez [Sun, 19 Jan 2020 19:40:40 +0000 (11:40 -0800)]
bpf: Fix memory leaks in generic update/delete batch ops

Generic update/delete batch ops functions were using __bpf_copy_key
without properly freeing the memory. Handle the memory allocation and
copy_from_user separately.

Fixes: aa2e93b8e58e ("bpf: Add generic support for update and delete batch ops")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200119194040.128369-1-brianvv@google.com
4 years agoMerge branch 'mlxsw-SPAN-egress-mirroring-buffer-size'
David S. Miller [Mon, 20 Jan 2020 12:25:47 +0000 (13:25 +0100)]
Merge branch 'mlxsw-SPAN-egress-mirroring-buffer-size'

Ido Schimmel says:

====================
mlxsw: Adjust SPAN egress mirroring buffer size handling for Spectrum-2

Jiri says:

For Spectrum-2 the computation of SPAN egress mirroring buffer uses a
different formula. On top of MTU it needs also current port speed. Fix
the computation and also trigger the buffer size set according to PUDE
event, which happens when port speed changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agospectrum: Add a delayed work to update SPAN buffsize according to speed
Jiri Pirko [Mon, 20 Jan 2020 07:52:53 +0000 (09:52 +0200)]
spectrum: Add a delayed work to update SPAN buffsize according to speed

When PUDE event is handled and the link is up, update the port SPAN
buffer size according to the current speed.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: spectrum: Fix SPAN egress mirroring buffer size for Spectrum-2
Jiri Pirko [Mon, 20 Jan 2020 07:52:52 +0000 (09:52 +0200)]
mlxsw: spectrum: Fix SPAN egress mirroring buffer size for Spectrum-2

For SPAN egress mirroring buffer size, it is needed to use a different
formula for Spectrum and Spectrum-2. Move the buffer size computation to
ops and implement new formula for Spectrum-2.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: spectrum_span: Put buffsize update code into helper function
Jiri Pirko [Mon, 20 Jan 2020 07:52:51 +0000 (09:52 +0200)]
mlxsw: spectrum_span: Put buffsize update code into helper function

Avoid duplication of code that is doing buffsize update and put it into
a separate helper function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: spectrum: Push code getting port speed into a helper
Jiri Pirko [Mon, 20 Jan 2020 07:52:50 +0000 (09:52 +0200)]
mlxsw: spectrum: Push code getting port speed into a helper

Currently PTP code queries directly PTYS register for port speed from
work scheduled upon PUDE event. Since the speed needs to be used for
SPAN buffer size computation as well, push the code into a separate
helper.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-phy-add-generic-ndo_do_ioctl-handler-phy_do_ioctl'
David S. Miller [Mon, 20 Jan 2020 09:43:24 +0000 (10:43 +0100)]
Merge branch 'net-phy-add-generic-ndo_do_ioctl-handler-phy_do_ioctl'

Heiner Kallweit says:

====================
net: phy: add generic ndo_do_ioctl handler phy_do_ioctl

A number of network drivers has the same glue code to use phy_mii_ioctl
as ndo_do_ioctl handler. So let's add such a generic ndo_do_ioctl
handler to phylib. As first user convert r8169.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8169: use generic ndo_do_ioctl handler phy_do_ioctl
Heiner Kallweit [Sun, 19 Jan 2020 13:32:49 +0000 (14:32 +0100)]
r8169: use generic ndo_do_ioctl handler phy_do_ioctl

Replace rtl8169_ioctl with new generic function phy_do_ioctl.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: add generic ndo_do_ioctl handler phy_do_ioctl
Heiner Kallweit [Sun, 19 Jan 2020 13:31:55 +0000 (14:31 +0100)]
net: phy: add generic ndo_do_ioctl handler phy_do_ioctl

A number of network drivers has the same glue code to use phy_mii_ioctl
as ndo_do_ioctl handler. So let's add such a generic ndo_do_ioctl
handler to phylib.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: mv88e6xxx: Add SERDES stats counters to all 6390 family members
Andrew Lunn [Sat, 18 Jan 2020 18:40:56 +0000 (19:40 +0100)]
net: dsa: mv88e6xxx: Add SERDES stats counters to all 6390 family members

The SERDES statistics are valid for all members of the 6390 family,
not just the 6390 itself. Add the needed callbacks to all members of
the family.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: allow in-band AN for USXGMII
Alex Marginean [Sat, 18 Jan 2020 12:19:15 +0000 (14:19 +0200)]
net: phylink: allow in-band AN for USXGMII

USXGMII supports passing link information in-band between PHY and MAC PCS,
add it to the list of protocols that support in-band AN mode.

Being a MAC-PHY protocol that can auto-negotiate link speeds up to 10
Gbps, we populate the initial supported mask with the entire spectrum of
link modes up to 10G that PHYLINK supports, and we let the driver reduce
that mask in its .phylink_validate method.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: don't crash in phy_read/_write_mmd without a PHY driver
Alex Marginean [Thu, 16 Jan 2020 17:46:28 +0000 (19:46 +0200)]
net: phy: don't crash in phy_read/_write_mmd without a PHY driver

The APIs can be used by Ethernet drivers without actually loading a PHY
driver. This may become more widespread in the future with 802.3z
compatible MAC PCS devices being locally driven by the MAC driver when
configuring for a PHY mode with in-band negotiation.

Check that drv is not NULL before reading from it.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: Allow 2.5BASE-T, 5GBASE-T and 10GBASE-T for the 10G link modes
Vladimir Oltean [Thu, 16 Jan 2020 17:36:56 +0000 (19:36 +0200)]
net: phylink: Allow 2.5BASE-T, 5GBASE-T and 10GBASE-T for the 10G link modes

For some reason, PHYLINK does not put the copper modes for 802.3bz
(NBASE-T) and 802.3an-2006 (10GBASE-T) in the PHY's supported mask, when
the PHY-MAC connection is a 10G-capable one (10GBase-KR, 10GBase-R,
USXGMII). One possible way through which the cable side can work at the
lower speed is by having the PHY emit PAUSE frames towards the MAC. So
fix that omission.

Also include the 2500Base-X fiber mode in this list while we're at it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: modified pcs mode support for RGMII
Dejin Zheng [Wed, 15 Jan 2020 15:53:23 +0000 (23:53 +0800)]
net: stmmac: modified pcs mode support for RGMII

snps databook noted that physical coding sublayer (PCS) interface
that can be used when the MAC is configured for the TBI, RTBI, or
SGMII PHY interface. we have RGMII and SGMII in a SoC and it also
has the PCS block. it needs stmmac_init_phy and stmmac_mdio_register
function for initializing phy when it used RGMII interface.

Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net
David S. Miller [Sun, 19 Jan 2020 21:10:04 +0000 (22:10 +0100)]
Merge ra./pub/scm/linux/kernel/git/netdev/net

4 years agoMerge tag 'riscv/for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv...
Linus Torvalds [Sun, 19 Jan 2020 20:10:28 +0000 (12:10 -0800)]
Merge tag 'riscv/for-v5.5-rc7' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:
 "Three fixes for RISC-V:

   - Don't free and reuse memory containing the code that CPUs parked at
     boot reside in.

   - Fix rv64 build problems for ubsan and some modules by adding
     logical and arithmetic shift helpers for 128-bit values. These are
     from libgcc and are similar to what's present for ARM64.

   - Fix vDSO builds to clean up their own temporary files"

* tag 'riscv/for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Less inefficient gcc tishift helpers (and export their symbols)
  riscv: delete temporary files
  riscv: make sure the cores stay looping in .Lsecondary_park

4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Sun, 19 Jan 2020 20:03:53 +0000 (12:03 -0800)]
Merge git://git./linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Fix non-blocking connect() in x25, from Martin Schiller.

 2) Fix spurious decryption errors in kTLS, from Jakub Kicinski.

 3) Netfilter use-after-free in mtype_destroy(), from Cong Wang.

 4) Limit size of TSO packets properly in lan78xx driver, from Eric
    Dumazet.

 5) r8152 probe needs an endpoint sanity check, from Johan Hovold.

 6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from
    John Fastabend.

 7) hns3 needs short frames padded on transmit, from Yunsheng Lin.

 8) Fix netfilter ICMP header corruption, from Eyal Birger.

 9) Fix soft lockup when low on memory in hns3, from Yonglong Liu.

10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan.

11) Fix memory leak in act_ctinfo, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
  cxgb4: reject overlapped queues in TC-MQPRIO offload
  cxgb4: fix Tx multi channel port rate limit
  net: sched: act_ctinfo: fix memory leak
  bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
  bnxt_en: Fix ipv6 RFS filter matching logic.
  bnxt_en: Fix NTUPLE firmware command failures.
  net: systemport: Fixed queue mapping in internal ring map
  net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
  net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
  net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
  net: hns: fix soft lockup when there is not enough memory
  net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
  net/sched: act_ife: initalize ife->metalist earlier
  netfilter: nat: fix ICMP header corruption on ICMP errors
  net: wan: lapbether.c: Use built-in RCU list checking
  netfilter: nf_tables: fix flowtable list del corruption
  netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()
  netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
  netfilter: nft_tunnel: ERSPAN_VERSION must not be null
  netfilter: nft_tunnel: fix null-attribute check
  ...

4 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sun, 19 Jan 2020 20:02:06 +0000 (12:02 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Two runtime PM fixes and one leak fix"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: iop3xx: Fix memory leak in probe error path
  i2c: tegra: Properly disable runtime PM on driver's probe error
  i2c: tegra: Fix suspending in active runtime PM state

4 years agoMerge branch 'mlxsw-Add-tunnel-devlink-trap-support'
David S. Miller [Sun, 19 Jan 2020 15:23:53 +0000 (16:23 +0100)]
Merge branch 'mlxsw-Add-tunnel-devlink-trap-support'

Ido Schimmel says:

====================
mlxsw: Add tunnel devlink-trap support

This patch set from Amit adds support in mlxsw for tunnel traps and a
few additional layer 3 traps that can report drops and exceptions via
devlink-trap.

These traps allow the user to more quickly diagnose problems relating to
tunnel decapsulation errors, such as packet being too short to
decapsulate or a packet containing wrong GRE key in its GRE header.

Patch set overview:

Patches #1-#4 add three additional layer 3 traps. Two of which are
mlxsw-specific as they relate to hardware-specific errors. The patches
include documentation of each trap and selftests.

Patches #5-#8 are preparations. They ensure that the correct ECN bits
are set in the outer header during IPinIP encapsulation and that packets
with an invalid ECN combination in underlay and overlay are trapped to
the kernel and not decapsulated in hardware.

Patches #9-#15 add support for two tunnel related traps. Each trap is
documented and selftested using both VXLAN and IPinIP tunnels, if
applicable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: devlink_trap_tunnel_vxlan: Add test case for overlay_smac_is_mc
Amit Cohen [Sun, 19 Jan 2020 13:01:00 +0000 (15:01 +0200)]
selftests: devlink_trap_tunnel_vxlan: Add test case for overlay_smac_is_mc

Test that the trap is triggered under the right conditions and that
devlink counters increase when action is trap.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: Add OVERLAY_SMAC_MC trap
Amit Cohen [Sun, 19 Jan 2020 13:00:59 +0000 (15:00 +0200)]
mlxsw: Add OVERLAY_SMAC_MC trap

Add a trap for NVE packets that the device decided to drop because their
overlay source MAC is multicast.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodevlink: Add overlay source MAC is multicast trap
Amit Cohen [Sun, 19 Jan 2020 13:00:58 +0000 (15:00 +0200)]
devlink: Add overlay source MAC is multicast trap

Add packet trap that can report NVE packets that the device decided to
drop because their overlay source MAC is multicast.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: devlink_trap_tunnel_ipip: Add test case for decap_error
Amit Cohen [Sun, 19 Jan 2020 13:00:57 +0000 (15:00 +0200)]
selftests: devlink_trap_tunnel_ipip: Add test case for decap_error

Test that the trap is triggered under the right conditions and that
devlink counters increase.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: devlink_trap_tunnel_vxlan: Add test case for decap_error
Amit Cohen [Sun, 19 Jan 2020 13:00:56 +0000 (15:00 +0200)]
selftests: devlink_trap_tunnel_vxlan: Add test case for decap_error

Test that the trap is triggered under the right conditions and that
devlink counters increase.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: Add tunnel devlink-trap support
Amit Cohen [Sun, 19 Jan 2020 13:00:55 +0000 (15:00 +0200)]
mlxsw: Add tunnel devlink-trap support

Add the trap IDs and trap group used to report tunnel drops. Register
tunnel packet traps and associated tunnel trap group with devlink
during driver initialization.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodevlink: Add tunnel generic packet traps
Amit Cohen [Sun, 19 Jan 2020 13:00:54 +0000 (15:00 +0200)]
devlink: Add tunnel generic packet traps

Add packet traps that can report packets that were dropped during tunnel
decapsulation.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: spectrum_trap: Reorder cases according to enum order
Amit Cohen [Sun, 19 Jan 2020 13:00:53 +0000 (15:00 +0200)]
mlxsw: spectrum_trap: Reorder cases according to enum order

Move L3_DROPS case to appear after L2_DROPS case.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: Add ECN configurations with IPinIP tunnels
Amit Cohen [Sun, 19 Jan 2020 13:00:52 +0000 (15:00 +0200)]
mlxsw: Add ECN configurations with IPinIP tunnels

Initialize ECN mapping registers during router init according to
INET_ECN_encapsulate() and INET_ECN_decapsulate().

For IP-in-IP encapsulation, this is required to ensure that ECN bits in
the underlay are set in accordance with the kernel. For decapsulation,
this is required to ensure that packets with invalid ECN combination in
underlay and overlay are trapped to the kernel and not forwarded.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: reg: Add Tunneling IPinIP Decapsulation ECN Mapping Register
Amit Cohen [Sun, 19 Jan 2020 13:00:51 +0000 (15:00 +0200)]
mlxsw: reg: Add Tunneling IPinIP Decapsulation ECN Mapping Register

This register configures the actions that are done during IPinIP
decapsulation based on the ECN bits.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: reg: Add Tunneling IPinIP Encapsulation ECN Mapping Register
Amit Cohen [Sun, 19 Jan 2020 13:00:50 +0000 (15:00 +0200)]
mlxsw: reg: Add Tunneling IPinIP Encapsulation ECN Mapping Register

This register performs mapping from overlay ECN to underlay ECN during
IPinIP encapsulation.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: Add NON_ROUTABLE trap
Amit Cohen [Sun, 19 Jan 2020 13:00:49 +0000 (15:00 +0200)]
mlxsw: Add NON_ROUTABLE trap

Add a trap for packets that the device decided to drop because they are
not supposed to be routed. For example, IGMP queries can be flooded by
the device in layer 2 and reach the router. Such packets should not be
routed and instead dropped.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodevlink: Add non-routable packet trap
Amit Cohen [Sun, 19 Jan 2020 13:00:48 +0000 (15:00 +0200)]
devlink: Add non-routable packet trap

Add packet trap that can report packets that reached the router, but are
non-routable. For example, IGMP queries can be flooded by the device in
layer 2 and reach the router. Such packets should not be routed and
instead dropped.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: devlink_trap_l3_drops: Add test cases of irif and erif disabled
Amit Cohen [Sun, 19 Jan 2020 13:00:47 +0000 (15:00 +0200)]
selftests: devlink_trap_l3_drops: Add test cases of irif and erif disabled

Add test cases to check that packets routed through disabled RIFs and
packets routed from disabled RIFs are dropped and devlink counters
increase when the action is trap.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: Add irif and erif disabled traps
Amit Cohen [Sun, 19 Jan 2020 13:00:46 +0000 (15:00 +0200)]
mlxsw: Add irif and erif disabled traps

IRIF_DISABLED and ERIF_DISABLED are driver specific traps. Packets are
dropped for these reasons when they need to be routed through/from
existing router interfaces (RIF) which are disabled.

Add devlink driver-specific traps and mlxsw trap IDs used to report
these traps.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>