platform/kernel/linux-rpi.git
5 years agobpf: skb_verdict, support SK_PASS on RX BPF path
John Fastabend [Thu, 20 Dec 2018 19:35:32 +0000 (11:35 -0800)]
bpf: skb_verdict, support SK_PASS on RX BPF path

Add SK_PASS verdict support to SK_SKB_VERDICT programs. Now that
support for redirects exists we can implement SK_PASS as a redirect
to the same socket. This simplifies the BPF programs and avoids an
extra map lookup on RX path for simple visibility cases.

Further, reduces user (BPF programmer in this context) confusion
when their program drops skb due to lack of support.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: skmsg, replace comments with BUILD bug
John Fastabend [Thu, 20 Dec 2018 19:35:31 +0000 (11:35 -0800)]
bpf: skmsg, replace comments with BUILD bug

Enforce comment on structure layout dependency with a BUILD_BUG_ON
to ensure the condition is maintained.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: sk_msg, improve offset chk in _is_valid_access
John Fastabend [Thu, 20 Dec 2018 19:35:30 +0000 (11:35 -0800)]
bpf: sk_msg, improve offset chk in _is_valid_access

The check for max offset in sk_msg_is_valid_access uses sizeof()
which is incorrect because it would allow accessing possibly
past the end of the struct in the padded case. Further, it doesn't
preclude accessing any padding that may be added in the middle of
a struct. All told this makes it fragile to rely on.

To fix this explicitly check offsets with fields using the
bpf_ctx_range() and bpf_ctx_range_till() macros.

For reference the current structure layout looks as follows (reported
by pahole)

struct sk_msg_md {
union {
void *             data;                 /*           8 */
};                                               /*     0     8 */
union {
void *             data_end;             /*           8 */
};                                               /*     8     8 */
__u32                      family;               /*    16     4 */
__u32                      remote_ip4;           /*    20     4 */
__u32                      local_ip4;            /*    24     4 */
__u32                      remote_ip6[4];        /*    28    16 */
__u32                      local_ip6[4];         /*    44    16 */
__u32                      remote_port;          /*    60     4 */
/* --- cacheline 1 boundary (64 bytes) --- */
__u32                      local_port;           /*    64     4 */
__u32                      size;                 /*    68     4 */

/* size: 72, cachelines: 2, members: 10 */
/* last cacheline: 8 bytes */
};

So there should be no padding at the moment but fixing this now
prevents future errors.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: sk_msg, fix sk_msg_md access past end test
John Fastabend [Thu, 20 Dec 2018 19:35:29 +0000 (11:35 -0800)]
bpf: sk_msg, fix sk_msg_md access past end test

Currently, the test to ensure reads past the end of the sk_msg_md
data structure fail is incorrectly expecting success. Fix this
typo and use correct expected error.

Fixes: 945a47d87cee ("bpf: sk_msg, add tests for size field")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't
Jesper Dangaard Brouer [Wed, 19 Dec 2018 16:00:23 +0000 (17:00 +0100)]
bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't

The frame_size passed to build_skb must be aligned, else it is
possible that the embedded struct skb_shared_info gets unaligned.

For correctness make sure that xdpf->headroom in included in the
alignment. No upstream drivers can hit this, as all XDP drivers provide
an aligned headroom.  This was discovered when playing with implementing
XDP support for mvneta, which have a 2 bytes DSA header, and this
Marvell ARM64 platform didn't like doing atomic operations on an
unaligned skb_shinfo(skb)->dataref addresses.

Fixes: 1c601d829ab0 ("bpf: cpumap xdp_buff to skb conversion and allocation")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'bpf-jset-verifier'
Daniel Borkmann [Thu, 20 Dec 2018 16:28:29 +0000 (17:28 +0100)]
Merge branch 'bpf-jset-verifier'

Jakub Kicinski says:

====================
This is a v2 of the patch set to teach the verifier about BPF_JSET
instruction.  There is also a number of tests include for both
basic functioning of the instruction and the verifier logic.
The NFP JIT handling of JSET is tweaked.  Last patch adds missing
file to gitignore.

Reposting part of previous series without the dead code elimination.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests: bpf: add missing executables to .gitignore
Jakub Kicinski [Thu, 20 Dec 2018 06:13:09 +0000 (22:13 -0800)]
selftests: bpf: add missing executables to .gitignore

commit 435f90a338ae ("selftests/bpf: add a test case for sock_ops
perf-event notification") missed adding new test to gitignore.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agonfp: bpf: optimize codegen for JSET with a constant
Jakub Kicinski [Thu, 20 Dec 2018 06:13:08 +0000 (22:13 -0800)]
nfp: bpf: optimize codegen for JSET with a constant

The top word of the constant can only have bits set if sign
extension set it to all-1, therefore we don't really have to
mask the top half of the register.  We can just OR it into
the result as is.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agonfp: bpf: remove the trivial JSET optimization
Jakub Kicinski [Thu, 20 Dec 2018 06:13:07 +0000 (22:13 -0800)]
nfp: bpf: remove the trivial JSET optimization

The verifier will now understand the JSET instruction, so don't
mark the dead branch in the JIT as noop.  We won't generate any
code, anyway.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: verifier: reorder stack size check with dead code sanitization
Jakub Kicinski [Thu, 20 Dec 2018 06:13:06 +0000 (22:13 -0800)]
bpf: verifier: reorder stack size check with dead code sanitization

Reorder the calls to check_max_stack_depth() and sanitize_dead_code()
to separate functions which can rewrite instructions from pure checks.

No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests: bpf: verifier: add tests for JSET interpretation
Jakub Kicinski [Thu, 20 Dec 2018 06:13:05 +0000 (22:13 -0800)]
selftests: bpf: verifier: add tests for JSET interpretation

Validate that the verifier reasons correctly about the bounds
and removes dead code based on results of JSET instruction.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: verifier: teach the verifier to reason about the BPF_JSET instruction
Jakub Kicinski [Thu, 20 Dec 2018 06:13:04 +0000 (22:13 -0800)]
bpf: verifier: teach the verifier to reason about the BPF_JSET instruction

Some JITs (nfp) try to optimize code on their own.  It could make
sense in case of BPF_JSET instruction which is currently not interpreted
by the verifier, meaning for instance that dead could would not be
detected if it was under BPF_JSET branch.

Teach the verifier basics of BPF_JSET, JIT optimizations will be
removed shortly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests: bpf: add trivial JSET tests
Jakub Kicinski [Thu, 20 Dec 2018 06:13:03 +0000 (22:13 -0800)]
selftests: bpf: add trivial JSET tests

We seem to have no JSET instruction test, and LLVM does not
generate it at all, so let's add a simple hand-coded test
to make sure JIT implementations are correct.

v2:
 - extend test_verifier to handle multiple inputs and
   add the sample there (Daniel)
 - add a sign extension case

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: sparc64: Enable sparc64 jit to provide bpf_line_info
Martin KaFai Lau [Wed, 19 Dec 2018 21:30:54 +0000 (13:30 -0800)]
bpf: sparc64: Enable sparc64 jit to provide bpf_line_info

This patch enables sparc64's bpf_int_jit_compile() to provide
bpf_line_info by calling bpf_prog_fill_jited_linfo().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'line_info-check-for-ld_imm64'
Alexei Starovoitov [Wed, 19 Dec 2018 23:42:55 +0000 (15:42 -0800)]
Merge branch 'line_info-check-for-ld_imm64'

Martin KaFai Lau says:

====================
This series ensures the line_info (passed by the userspace during
bpf_prog_load) cannot have its line_info.insn_off pointing to a
zero bpf insn code.  F.e. a broken userspace tool might
generate a line_info.insn_off that points to the second
8 bytes of a BPF_LD_IMM64.

The first patch is the kernel change.
The second patch is a new test case.
====================

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: Add BPF_LD_IMM64 to the line_info test
Martin KaFai Lau [Wed, 19 Dec 2018 21:01:02 +0000 (13:01 -0800)]
bpf: Add BPF_LD_IMM64 to the line_info test

This patch adds a BPF_LD_IMM64 case to the line_info test
to ensure the kernel rejects linfo_info.insn_off pointing
to the 2nd 8 bytes of the BPF_LD_IMM64.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: Ensure line_info.insn_off cannot point to insn with zero code
Martin KaFai Lau [Wed, 19 Dec 2018 21:01:01 +0000 (13:01 -0800)]
bpf: Ensure line_info.insn_off cannot point to insn with zero code

This patch rejects a line_info if the bpf insn code referred by
line_info.insn_off is 0. F.e. a broken userspace tool might generate
a line_info.insn_off that points to the second 8 bytes of a BPF_LD_IMM64.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agotools: bpftool: do not force gcc as CC
Ivan Babrou [Wed, 19 Dec 2018 20:08:03 +0000 (12:08 -0800)]
tools: bpftool: do not force gcc as CC

This allows transparent cross-compilation with CROSS_COMPILE by
relying on 7ed1c1901fe5 ("tools: fix cross-compile var clobbering").

Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoxsk: simplify AF_XDP socket teardown
Björn Töpel [Wed, 19 Dec 2018 12:09:31 +0000 (13:09 +0100)]
xsk: simplify AF_XDP socket teardown

Prior this commit, when the struct socket object was being released,
the UMEM did not have its reference count decreased. Instead, this was
done in the struct sock sk_destruct function.

There is no reason to keep the UMEM reference around when the socket
is being orphaned, so in this patch the xdp_put_mem is called in the
xsk_release function. This results in that the xsk_destruct function
can be removed!

Note that, it still holds that a struct xsk_sock reference might still
linger in the XSKMAP after the UMEM is released, e.g. if a user does
not clear the XSKMAP prior to closing the process. This sock will be
in a "released" zombie like state, until the XSKMAP is removed.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: log struct/union attribute for forward type
Yonghong Song [Tue, 18 Dec 2018 21:43:58 +0000 (13:43 -0800)]
bpf: log struct/union attribute for forward type

Current btf internal verbose logger logs fwd type as
  [2] FWD A type_id=0
where A is the type name.

Commit 9d5f9f701b18 ("bpf: btf: fix struct/union/fwd types
with kind_flag") introduced kind_flag which can be used
to distinguish whether a forward type is a struct or
union.

Also, "type_id=0" does not carry any meaningful
information for fwd type as btf_type.type = 0 is simply
enforced during btf verification and is not used
anywhere else.

This commit changed the log to
  [2] FWD A struct
if kind_flag = 0, or
  [2] FWD A union
if kind_flag = 1.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'bpf-sk-msg-size-member'
Daniel Borkmann [Tue, 18 Dec 2018 23:27:24 +0000 (00:27 +0100)]
Merge branch 'bpf-sk-msg-size-member'

John Fastabend says:

====================
This adds a size field to the sk_msg_md data structure used by SK_MSG
programs. Without this in the zerocopy case and in the copy case
where multiple iovs are in use its difficult to know how much data
can be pulled in. The normal method of reading data and data_end
only give the current contiguous buffer. BPF programs can attempt to
pull in extra data but have to guess if it exists. This can result
in multiple "guesses" its much better if we know upfront the size
of the sk_msg.
====================

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: sk_msg, add tests for size field
John Fastabend [Sun, 16 Dec 2018 23:47:06 +0000 (15:47 -0800)]
bpf: sk_msg, add tests for size field

This adds tests to read the size field to test_verifier.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: add tools lib/include support sk_msg_md size field
John Fastabend [Sun, 16 Dec 2018 23:47:05 +0000 (15:47 -0800)]
bpf: add tools lib/include support sk_msg_md size field

Add the size field to sk_msg_md for tools.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: sockmap, metadata support for reporting size of msg
John Fastabend [Sun, 16 Dec 2018 23:47:04 +0000 (15:47 -0800)]
bpf: sockmap, metadata support for reporting size of msg

This adds metadata to sk_msg_md for BPF programs to read the sk_msg
size.

When the SK_MSG program is running under an application that is using
sendfile the data is not copied into sk_msg buffers by default. Rather
the BPF program uses sk_msg_pull_data to read the bytes in. This
avoids doing the costly memcopy instructions when they are not in
fact needed. However, if we don't know the size of the sk_msg we
have to guess if needed bytes are available by doing a pull request
which may fail. By including the size of the sk_msg BPF programs can
check the size before issuing sk_msg_pull_data requests.

Additionally, the same applies for sendmsg calls when the application
provides multiple iovs. Here the BPF program needs to pull in data
to update data pointers but its not clear where the data ends without
a size parameter. In many cases "guessing" is not easy to do
and results in multiple calls to pull and without bounded loops
everything gets fairly tricky.

Clean this up by including a u32 size field. Note, all writes into
sk_msg_md are rejected already from sk_msg_is_valid_access so nothing
additional is needed there.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: correct slot_type marking logic to allow more stack slot sharing
Jiong Wang [Sat, 15 Dec 2018 08:34:40 +0000 (03:34 -0500)]
bpf: correct slot_type marking logic to allow more stack slot sharing

Verifier is supposed to support sharing stack slot allocated to ptr with
SCALAR_VALUE for privileged program. However this doesn't happen for some
cases.

The reason is verifier is not clearing slot_type STACK_SPILL for all bytes,
it only clears part of them, while verifier is using:

  slot_type[0] == STACK_SPILL

as a convention to check one slot is ptr type.

So, the consequence of partial clearing slot_type is verifier could treat a
partially overridden ptr slot, which should now be a SCALAR_VALUE slot,
still as ptr slot, and rejects some valid programs.

Before this patch, test_xdp_noinline.o under bpf selftests, bpf_lxc.o and
bpf_netdev.o under Cilium bpf repo, when built with -mattr=+alu32 are
rejected due to this issue. After this patch, they all accepted.

There is no processed insn number change before and after this patch on
Cilium bpf programs.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: support raw tracepoints in modules
Matt Mullins [Thu, 13 Dec 2018 00:42:37 +0000 (16:42 -0800)]
bpf: support raw tracepoints in modules

Distributions build drivers as modules, including network and filesystem
drivers which export numerous tracepoints.  This enables
bpf(BPF_RAW_TRACEPOINT_OPEN) to attach to those tracepoints.

Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoMerge branch 'bpf-bpftool-mount-tracefs'
Daniel Borkmann [Tue, 18 Dec 2018 13:47:18 +0000 (14:47 +0100)]
Merge branch 'bpf-bpftool-mount-tracefs'

Quentin Monnet says:

====================
This series focus on mounting (or not mounting) tracefs with bpftool.

First patch makes bpftool attempt to mount tracefs if tracefs is not
found when running "bpftool prog tracelog".

Second patch adds an option to bpftool to prevent it from attempting
to mount any file system (tracefs or bpffs), in case this behaviour
is undesirable for some users.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: add an option to prevent auto-mount of bpffs, tracefs
Quentin Monnet [Tue, 18 Dec 2018 10:13:19 +0000 (10:13 +0000)]
tools: bpftool: add an option to prevent auto-mount of bpffs, tracefs

In order to make life easier for users, bpftool automatically attempts
to mount the BPF virtual file system, if it is not mounted already,
before trying to pin objects in it. Similarly, it attempts to mount
tracefs if necessary before trying to dump the trace pipe to the
console.

While mounting file systems on-the-fly can improve user experience, some
administrators might prefer to avoid that. Let's add an option to block
these mount attempts. Note that it does not prevent automatic mounting
of tracefs by debugfs for the "bpftool prog tracelog" command.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: attempt to mount tracefs if required for tracelog cmd
Quentin Monnet [Tue, 18 Dec 2018 10:13:18 +0000 (10:13 +0000)]
tools: bpftool: attempt to mount tracefs if required for tracelog cmd

As a follow-up to commit 30da46b5dc3a ("tools: bpftool: add a command to
dump the trace pipe"), attempt to mount the tracefs virtual file system
if it is not detected on the system before trying to dump content of the
tracing pipe on an invocation of "bpftool prog tracelog".

Usually, tracefs in automatically mounted by debugfs when the user tries
to access it (e.g. "ls /sys/kernel/debug/tracing" mounts the tracefs).
So if we failed to find it, it is probably that debugfs is not here
either. Therefore, we just attempt a single mount, at a location that
does not involve debugfs: /sys/kernel/tracing.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools/bpf: check precise {func, line, jited_line}_info_rec_size in test_btf
Yonghong Song [Tue, 18 Dec 2018 01:31:57 +0000 (17:31 -0800)]
tools/bpf: check precise {func, line, jited_line}_info_rec_size in test_btf

Current btf func_info, line_info and jited_line are designed to be
extensible. The record sizes for {func,line}_info are passed to kernel,
and the record sizes for {func,line,jited_line}_info are returned to
userspace during bpf_prog_info query.

In bpf selftests test_btf.c, when testing whether kernel returns
a legitimate {func,line, jited_line)_info rec_size, the test only
compares to the minimum allowed size. If the returned rec_size is smaller
than the minimum allowed size, it is considered incorrect.
The minimum allowed size for these three info sizes are equal to
current value of sizeof(struct bpf_func_info), sizeof(struct bpf_line_info)
and sizeof(__u64).

The original thinking was that in the future when rec_size is increased
in kernel, the same test should run correctly. But this sacrificed
the precision of testing under the very kernel the test is shipped with,
and bpf selftest is typically run with the same repo kernel.

So this patch changed the testing of rec_size such that the
kernel returned value should be equal to the size defined by
tools uapi header bpf.h which syncs with kernel uapi header.

Martin discovered a bug in one of rec_size comparisons.
Instead of comparing to minimum func_info rec_size 8, it compares to 4.
This patch fixed that issue as well.

Fixes: 999d82cbc044 ("tools/bpf: enhance test_btf file testing to test func info")
Fixes: 05687352c600 ("bpf: Refactor and bug fix in test_func_type in test_btf.c")
Fixes: 4d6304c76355 ("bpf: Add unit tests for bpf_line_info")
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: libbpf: fix memleak by freeing line_info
Prashant Bhole [Mon, 17 Dec 2018 07:57:50 +0000 (16:57 +0900)]
bpf: libbpf: fix memleak by freeing line_info

This patch fixes a memory leak in libbpf by freeing up line_info
member of struct bpf_program while unloading a program.

Fixes: 3d65014146c6 ("bpf: libbpf: Add btf_line_info support to libbpf")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'bpf-btf-type-fixes'
Daniel Borkmann [Tue, 18 Dec 2018 00:12:00 +0000 (01:12 +0100)]
Merge branch 'bpf-btf-type-fixes'

Yonghong Song says:

====================
Commit 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)")
introduced BTF, a debug info format for BTF.

The original design has a couple of issues though.
First, the bitfield size is only encoded in int type.
If the struct member bitfield type is enum, pahole ([1])
or llvm is forced to replace enum with int type. As a result, the original
type information gets lost.

Second, the original BTF design does not envision the possibility of
BTF=>header_file conversion ([2]), hence does not encode "struct" or
"union" info for a forward type. Such information is necessary to
convert BTF to a header file.

This patch set fixed the issue by introducing kind_flag, using one bit
in type->info. When kind_flag, the struct/union btf_member->offset
will encode both bitfield_size and bit_offset, covering both
int and enum base types. The kind_flag is also used to indicate whether
the forward type is a union (when set) or a struct.

Patch #1 refactors function btf_int_bits_seq_show() so Patch #2
can reuse part of the function.
Patch #2 implemented kind_flag support for struct/union/fwd types.
Patch #3 added kind_flag support for cgroup local storage map pretty print.
Patch #4 syncs kernel uapi btf.h to tools directory.
Patch #5 added unit tests for kind_flag.
Patch #6 added tests for kernel bpffs based pretty print with kind_flag.
Patch #7 refactors function btf_dumper_int_bits() so Patch #8
can reuse part of the function.
Patch #8 added bpftool support of pretty print with kind_flag set.

  [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=b18354f64cc215368c3bc0df4a7e5341c55c378c
  [2] https://lwn.net/SubscriberLink/773198/fe3074838f5c3f26/

Change logs:
  v2 -> v3:
    . Relocated comments about bitfield_size/bit_offset interpretation
      of the "offset" field right before the "offset" struct member.
    . Added missing byte alignment checking for non-bitfield enum
      member of a struct with kind_flag set.
    . Added two test cases in unit tests for struct type, kind_flag set,
      non-bitfield int/enum member, not-byte aligned bit offsets.
    . Added comments to help understand there is no overflow for
      total_bits_offset in bpftool function btf_dumper_int_bits().
    . Added explanation of typedef type dumping fix in Patch #8 commit
      message.

  v1 -> v2:
    . If kind_flag is set for a structure, ensure an int member,
      whether it is a bitfield or not, is a regular int type.
    . Added support so cgroup local storage map pretty print
      works with kind_flag.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: support pretty print with kind_flag set
Yonghong Song [Sun, 16 Dec 2018 06:13:58 +0000 (22:13 -0800)]
tools: bpftool: support pretty print with kind_flag set

The following example shows map pretty print with structures
which include bitfield members.

  enum A { A1, A2, A3, A4, A5 };
  typedef enum A ___A;
  struct tmp_t {
       char a1:4;
       int  a2:4;
       int  :4;
       __u32 a3:4;
       int b;
       ___A b1:4;
       enum A b2:4;
  };
  struct bpf_map_def SEC("maps") tmpmap = {
       .type = BPF_MAP_TYPE_ARRAY,
       .key_size = sizeof(__u32),
       .value_size = sizeof(struct tmp_t),
       .max_entries = 1,
  };
  BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t);

and the following map update in the bpf program:

  key = 0;
  struct tmp_t t = {};
  t.a1 = 2;
  t.a2 = 4;
  t.a3 = 6;
  t.b = 7;
  t.b1 = 8;
  t.b2 = 10;
  bpf_map_update_elem(&tmpmap, &key, &t, 0);

With this patch, I am able to print out the map values
correctly with this patch:
bpftool map dump id 187
  [{
        "key": 0,
        "value": {
            "a1": 0x2,
            "a2": 0x4,
            "a3": 0x6,
            "b": 7,
            "b1": 0x8,
            "b2": 0xa
        }
    }
  ]

Previously, if a function prototype argument has a typedef
type, the prototype is not printed since
function __btf_dumper_type_only() bailed out with error
if the type is a typedef. This commit corrected this
behavior by printing out typedef properly.

The following example shows forward type and
typedef type can be properly printed in function prototype
with modified test_btf_haskv.c.

  struct t;
  union  u;

  __attribute__((noinline))
  static int test_long_fname_1(struct dummy_tracepoint_args *arg,
                               struct t *p1, union u *p2,
                               __u32 unused)
  ...
  int _dummy_tracepoint(struct dummy_tracepoint_args *arg) {
    return test_long_fname_1(arg, 0, 0, 0);
  }

  $ bpftool p d xlated id 24
  ...
  int test_long_fname_1(struct dummy_tracepoint_args * arg,
                        struct t * p1, union u * p2,
                        __u32 unused)
  ...

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: refactor btf_dumper_int_bits()
Yonghong Song [Sun, 16 Dec 2018 06:13:57 +0000 (22:13 -0800)]
tools: bpftool: refactor btf_dumper_int_bits()

The core dump funcitonality in btf_dumper_int_bits() is
refactored into a separate function btf_dumper_bitfield()
which will be used by the next patch.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools/bpf: test kernel bpffs map pretty print with struct kind_flag
Yonghong Song [Sun, 16 Dec 2018 06:13:56 +0000 (22:13 -0800)]
tools/bpf: test kernel bpffs map pretty print with struct kind_flag

The new tests are added to test bpffs map pretty print in kernel with kind_flag
for structure type.

  $ test_btf -p
  ......
  BTF pretty print array(#1)......OK
  BTF pretty print array(#2)......OK
  PASS:8 SKIP:0 FAIL:0

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools/bpf: add test_btf unit tests for kind_flag
Yonghong Song [Sun, 16 Dec 2018 06:13:55 +0000 (22:13 -0800)]
tools/bpf: add test_btf unit tests for kind_flag

This patch added unit tests for different types handling
type->info.kind_flag. The following new tests are added:
  $ test_btf
  ...
  BTF raw test[82] (invalid int kind_flag): OK
  BTF raw test[83] (invalid ptr kind_flag): OK
  BTF raw test[84] (invalid array kind_flag): OK
  BTF raw test[85] (invalid enum kind_flag): OK
  BTF raw test[86] (valid fwd kind_flag): OK
  BTF raw test[87] (invalid typedef kind_flag): OK
  BTF raw test[88] (invalid volatile kind_flag): OK
  BTF raw test[89] (invalid const kind_flag): OK
  BTF raw test[90] (invalid restrict kind_flag): OK
  BTF raw test[91] (invalid func kind_flag): OK
  BTF raw test[92] (invalid func_proto kind_flag): OK
  BTF raw test[93] (valid struct kind_flag, bitfield_size = 0): OK
  BTF raw test[94] (valid struct kind_flag, int member, bitfield_size != 0): OK
  BTF raw test[95] (valid union kind_flag, int member, bitfield_size != 0): OK
  BTF raw test[96] (valid struct kind_flag, enum member, bitfield_size != 0): OK
  BTF raw test[97] (valid union kind_flag, enum member, bitfield_size != 0): OK
  BTF raw test[98] (valid struct kind_flag, typedef member, bitfield_size != 0): OK
  BTF raw test[99] (valid union kind_flag, typedef member, bitfield_size != 0): OK
  BTF raw test[100] (invalid struct type, bitfield_size greater than struct size): OK
  BTF raw test[101] (invalid struct type, kind_flag bitfield base_type int not regular): OK
  BTF raw test[102] (invalid struct type, kind_flag base_type int not regular): OK
  BTF raw test[103] (invalid union type, bitfield_size greater than struct size): OK
  ...
  PASS:122 SKIP:0 FAIL:0

The second parameter name of macro
  BTF_INFO_ENC(kind, root, vlen)
in selftests test_btf.c is also renamed from "root" to "kind_flag".
Note that before this patch "root" is not used and always 0.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools/bpf: sync btf.h header from kernel to tools
Yonghong Song [Sun, 16 Dec 2018 06:13:53 +0000 (22:13 -0800)]
tools/bpf: sync btf.h header from kernel to tools

Sync include/uapi/linux/btf.h to tools/include/uapi/linux/btf.h.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: enable cgroup local storage map pretty print with kind_flag
Yonghong Song [Sun, 16 Dec 2018 06:13:52 +0000 (22:13 -0800)]
bpf: enable cgroup local storage map pretty print with kind_flag

Commit 970289fc0a83 ("bpf: add bpffs pretty print for cgroup
local storage maps") added bpffs pretty print for cgroup
local storage maps. The commit worked for struct without kind_flag
set.

This patch refactored and made pretty print also work
with kind_flag set for the struct.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: btf: fix struct/union/fwd types with kind_flag
Yonghong Song [Sun, 16 Dec 2018 06:13:51 +0000 (22:13 -0800)]
bpf: btf: fix struct/union/fwd types with kind_flag

This patch fixed two issues with BTF. One is related to
struct/union bitfield encoding and the other is related to
forward type.

Issue #1 and solution:

======================

Current btf encoding of bitfield follows what pahole generates.
For each bitfield, pahole will duplicate the type chain and
put the bitfield size at the final int or enum type.
Since the BTF enum type cannot encode bit size,
pahole workarounds the issue by generating
an int type whenever the enum bit size is not 32.

For example,
  -bash-4.4$ cat t.c
  typedef int ___int;
  enum A { A1, A2, A3 };
  struct t {
    int a[5];
    ___int b:4;
    volatile enum A c:4;
  } g;
  -bash-4.4$ gcc -c -O2 -g t.c
The current kernel supports the following BTF encoding:
  $ pahole -JV t.o
  [1] TYPEDEF ___int type_id=2
  [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [3] ENUM A size=4 vlen=3
        A1 val=0
        A2 val=1
        A3 val=2
  [4] STRUCT t size=24 vlen=3
        a type_id=5 bits_offset=0
        b type_id=9 bits_offset=160
        c type_id=11 bits_offset=164
  [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
  [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
  [7] VOLATILE (anon) type_id=3
  [8] INT int size=1 bit_offset=0 nr_bits=4 encoding=(none)
  [9] TYPEDEF ___int type_id=8
  [10] INT (anon) size=1 bit_offset=0 nr_bits=4 encoding=SIGNED
  [11] VOLATILE (anon) type_id=10

Two issues are in the above:
  . by changing enum type to int, we lost the original
    type information and this will not be ideal later
    when we try to convert BTF to a header file.
  . the type duplication for bitfields will cause
    BTF bloat. Duplicated types cannot be deduplicated
    later if the bitfield size is different.

To fix this issue, this patch implemented a compatible
change for BTF struct type encoding:
  . the bit 31 of struct_type->info, previously reserved,
    now is used to indicate whether bitfield_size is
    encoded in btf_member or not.
  . if bit 31 of struct_type->info is set,
    btf_member->offset will encode like:
      bit 0 - 23: bit offset
      bit 24 - 31: bitfield size
    if bit 31 is not set, the old behavior is preserved:
      bit 0 - 31: bit offset

So if the struct contains a bit field, the maximum bit offset
will be reduced to (2^24 - 1) instead of MAX_UINT. The maximum
bitfield size will be 256 which is enough for today as maximum
bitfield in compiler can be 128 where int128 type is supported.

This kernel patch intends to support the new BTF encoding:
  $ pahole -JV t.o
  [1] TYPEDEF ___int type_id=2
  [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [3] ENUM A size=4 vlen=3
        A1 val=0
        A2 val=1
        A3 val=2
  [4] STRUCT t kind_flag=1 size=24 vlen=3
        a type_id=5 bitfield_size=0 bits_offset=0
        b type_id=1 bitfield_size=4 bits_offset=160
        c type_id=7 bitfield_size=4 bits_offset=164
  [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
  [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
  [7] VOLATILE (anon) type_id=3

Issue #2 and solution:
======================

Current forward type in BTF does not specify whether the original
type is struct or union. This will not work for type pretty print
and BTF-to-header-file conversion as struct/union must be specified.
  $ cat tt.c
  struct t;
  union u;
  int foo(struct t *t, union u *u) { return 0; }
  $ gcc -c -g -O2 tt.c
  $ pahole -JV tt.o
  [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [2] FWD t type_id=0
  [3] PTR (anon) type_id=2
  [4] FWD u type_id=0
  [5] PTR (anon) type_id=4

To fix this issue, similar to issue #1, type->info bit 31
is used. If the bit is set, it is union type. Otherwise, it is
a struct type.

  $ pahole -JV tt.o
  [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [2] FWD t kind_flag=0 type_id=0
  [3] PTR (anon) kind_flag=0 type_id=2
  [4] FWD u kind_flag=1 type_id=0
  [5] PTR (anon) kind_flag=0 type_id=4

Pahole/LLVM change:
===================

The new kind_flag functionality has been implemented in pahole
and llvm:
  https://github.com/yonghong-song/pahole/tree/bitfield
  https://github.com/yonghong-song/llvm/tree/bitfield

Note that pahole hasn't implemented func/func_proto kind
and .BTF.ext. So to print function signature with bpftool,
the llvm compiler should be used.

Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: btf: refactor btf_int_bits_seq_show()
Yonghong Song [Sun, 16 Dec 2018 06:13:50 +0000 (22:13 -0800)]
bpf: btf: refactor btf_int_bits_seq_show()

Refactor function btf_int_bits_seq_show() by creating
function btf_bitfield_seq_show() which has no dependence
on btf and btf_type. The function btf_bitfield_seq_show()
will be in later patch to directly dump bitfield member values.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: remove useless version check for prog load
Daniel Borkmann [Sat, 15 Dec 2018 23:49:47 +0000 (00:49 +0100)]
bpf: remove useless version check for prog load

Existing libraries and tracing frameworks work around this kernel
version check by automatically deriving the kernel version from
uname(3) or similar such that the user does not need to do it
manually; these workarounds also make the version check useless
at the same time.

Moreover, most other BPF tracing types enabling bpf_probe_read()-like
functionality have /not/ adapted this check, and in general these
days it is well understood anyway that all the tracing programs are
not stable with regards to future kernels as kernel internal data
structures are subject to change from release to release.

Back at last netconf we discussed [0] and agreed to remove this
check from bpf_prog_load() and instead document it here in the uapi
header that there is no such guarantee for stable API for these
programs.

  [0] http://vger.kernel.org/netconf2018_files/DanielBorkmann_netconf2018.pdf

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoMerge branch 'bpf-bpftool-cleanups'
Daniel Borkmann [Sat, 15 Dec 2018 00:31:49 +0000 (01:31 +0100)]
Merge branch 'bpf-bpftool-cleanups'

Quentin Monnet says:

====================
This series contains several minor fixes for bpftool source and
documentation.

The first patches focus on documentation: addition of an option in the page
for "bpftool prog", clean up and update of the same page, and addition of
an example of prog array map manipulation in "bpftool map" page.

The last two fix warnings susceptible to appear when libbfd is not present
(patch 4), or with additional warning flags passed to the compiler (last
patch).
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: fix -Wmissing declaration warnings
Quentin Monnet [Fri, 14 Dec 2018 13:56:01 +0000 (13:56 +0000)]
tools: bpftool: fix -Wmissing declaration warnings

Help compiler check arguments for several utility functions used to
print items to the console by adding the "printf" attribute when
declaring those functions.

Also, declare as "static" two functions that are only used in prog.c.

All of them discovered by compiling bpftool with
-Wmissing-format-attribute -Wmissing-declarations.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: fix warning on struct bpf_prog_linfo definition
Quentin Monnet [Fri, 14 Dec 2018 13:56:00 +0000 (13:56 +0000)]
tools: bpftool: fix warning on struct bpf_prog_linfo definition

The following warning appears when compiling bpftool without BFD
support:

main.h:198:23: warning: 'struct bpf_prog_linfo' declared inside
    parameter list will not be visible outside of this definition or
    declaration
          const struct bpf_prog_linfo *prog_linfo,

Fix it by declaring struct bpf_prog_linfo even in the case BFD is not
supported.

Fixes: b053b439b72a ("bpf: libbpf: bpftool: Print bpf_line_info during prog dump")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: add a prog array map update example to documentation
Quentin Monnet [Fri, 14 Dec 2018 13:55:59 +0000 (13:55 +0000)]
tools: bpftool: add a prog array map update example to documentation

Add an example in map documentation to show how to use bpftool in order
to update the references to programs hold by prog array maps.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: fix examples in documentation for bpftool prog
Quentin Monnet [Fri, 14 Dec 2018 13:55:58 +0000 (13:55 +0000)]
tools: bpftool: fix examples in documentation for bpftool prog

Bring various fixes to the manual page for "bpftool prog" set of
commands:

- Fix typos ("dum" -> "dump")
- Harmonise indentation and format for command output
- Update date format for program load time
- Add instruction numbers on program dumps
- Fix JSON format for the example program listing

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: add doc for -m option to bpftool-prog.rst
Quentin Monnet [Fri, 14 Dec 2018 13:55:57 +0000 (13:55 +0000)]
tools: bpftool: add doc for -m option to bpftool-prog.rst

The --mapcompat|-m option has been documented on the main bpftool.rst
page, and on the interactive help. As this option is useful for loading
programs with maps with the "bpftool prog load" command, it should also
appear in the related bpftool-prog.rst documentation page. Let's add it.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'bpf-improve-verifier-state-analysis'
Daniel Borkmann [Sat, 15 Dec 2018 00:28:33 +0000 (01:28 +0100)]
Merge branch 'bpf-improve-verifier-state-analysis'

Alexei Starovoitov says:

====================
v1->v2:
With optimization suggested by Jakub patch 4 safety check became
cheap enough.

Several improvements to verifier state logic.
Patch 1 - trivial optimization
Patch 3 - significant optimization for stack state equivalence
Patch 4 - safety check for liveness and prep for future state merging
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: add self-check logic to liveness analysis
Alexei Starovoitov [Thu, 13 Dec 2018 19:42:34 +0000 (11:42 -0800)]
bpf: add self-check logic to liveness analysis

Introduce REG_LIVE_DONE to check the liveness propagation
and prepare the states for merging.
See algorithm description in clean_live_states().

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: improve stacksafe state comparison
Alexei Starovoitov [Thu, 13 Dec 2018 19:42:33 +0000 (11:42 -0800)]
bpf: improve stacksafe state comparison

"if (old->allocated_stack > cur->allocated_stack)" check is too conservative.
In some cases explored stack could have allocated more space,
but that stack space was not live.
The test case improves from 19 to 15 processed insns
and improvement on real programs is significant as well:

                       before    after
bpf_lb-DLB_L3.o        1940      1831
bpf_lb-DLB_L4.o        3089      3029
bpf_lb-DUNKNOWN.o      1065      1064
bpf_lxc-DDROP_ALL.o    28052     26309
bpf_lxc-DUNKNOWN.o     35487     33517
bpf_netdev.o           10864     9713
bpf_overlay.o          6643      6184
bpf_lcx_jit.o          38437     37335

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests/bpf: check insn processed in test_verifier
Alexei Starovoitov [Thu, 13 Dec 2018 19:42:32 +0000 (11:42 -0800)]
selftests/bpf: check insn processed in test_verifier

Teach test_verifier to parse verifier output for insn processed
and compare with expected number.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: speed up stacksafe check
Alexei Starovoitov [Thu, 13 Dec 2018 19:42:31 +0000 (11:42 -0800)]
bpf: speed up stacksafe check

Don't check the same stack liveness condition 8 times.
once is enough.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'bpf_line_info-in-verifier'
Alexei Starovoitov [Fri, 14 Dec 2018 22:17:34 +0000 (14:17 -0800)]
Merge branch 'bpf_line_info-in-verifier'

Martin Lau says:

====================
This patch set provides bpf_line_info during the verifier's verbose
log.  Please see individual patch for details.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: verbose log bpf_line_info in verifier
Martin KaFai Lau [Thu, 13 Dec 2018 18:41:48 +0000 (10:41 -0800)]
bpf: verbose log bpf_line_info in verifier

This patch adds bpf_line_info during the verifier's verbose.
It can give error context for debug purpose.

~~~~~~~~~~
Here is the verbose log for backedge:
while (a) {
a += bpf_get_smp_processor_id();
bpf_trace_printk(fmt, sizeof(fmt), a);
}

~> bpftool prog load ./test_loop.o /sys/fs/bpf/test_loop type tracepoint
13: while (a) {
3: a += bpf_get_smp_processor_id();
back-edge from insn 13 to 3

~~~~~~~~~~
Here is the verbose log for invalid pkt access:
Modification to test_xdp_noinline.c:

data = (void *)(long)xdp->data;
data_end = (void *)(long)xdp->data_end;
/*
if (data + 4 > data_end)
return XDP_DROP;
*/
*(u32 *)data = dst->dst;

~> bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp
; data = (void *)(long)xdp->data;
224: (79) r2 = *(u64 *)(r10 -112)
225: (61) r2 = *(u32 *)(r2 +0)
; *(u32 *)data = dst->dst;
226: (63) *(u32 *)(r2 +0) = r1
invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0)
R2 offset is outside of the packet

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: Create a new btf_name_by_offset() for non type name use case
Martin KaFai Lau [Thu, 13 Dec 2018 18:41:46 +0000 (10:41 -0800)]
bpf: Create a new btf_name_by_offset() for non type name use case

The current btf_name_by_offset() is returning "(anon)" type name for
the offset == 0 case and "(invalid-name-offset)" for the out-of-bound
offset case.

It fits well for the internal BTF verbose log purpose which
is focusing on type.  For example,
offset == 0 => "(anon)" => anonymous type/name.
Returning non-NULL for the bad offset case is needed
during the BTF verification process because the BTF verifier may
complain about another field first before discovering the name_off
is invalid.

However, it may not be ideal for the newer use case which does not
necessary mean type name.  For example, when logging line_info
in the BPF verifier in the next patch, it is better to log an
empty src line instead of logging "(anon)".

The existing bpf_name_by_offset() is renamed to __bpf_name_by_offset()
and static to btf.c.

A new bpf_name_by_offset() is added for generic context usage.  It
returns "\0" for name_off == 0 (note that btf->strings[0] is "\0")
and NULL for invalid offset.  It allows the caller to decide
what is the best output in its context.

The new btf_name_by_offset() is overlapped with btf_name_offset_valid().
Hence, btf_name_offset_valid() is removed from btf.h to keep the btf.h API
minimal.  The existing btf_name_offset_valid() usage in btf.c could also be
replaced later.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoselftests/bpf: Fix sk lookup usage in test_sock_addr
Andrey Ignatov [Thu, 13 Dec 2018 21:19:01 +0000 (13:19 -0800)]
selftests/bpf: Fix sk lookup usage in test_sock_addr

Semantic of netns_id argument of bpf_sk_lookup_tcp and bpf_sk_lookup_udp
was changed (fixed) in f71c6143c203. Corresponding changes have to be
applied to all call sites in selftests. The patch fixes corresponding
call sites in test_sock_addr test: pass BPF_F_CURRENT_NETNS instead of 0
in netns_id argument.

Fixes: f71c6143c203 ("bpf: Support sk lookup in netns with id 0")
Reported-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Joe Stringer <joe@wand.net.nz>
Tested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: remove obsolete prog->aux sanitation in bpf_insn_prepare_dump
Daniel Borkmann [Wed, 12 Dec 2018 09:45:38 +0000 (10:45 +0100)]
bpf: remove obsolete prog->aux sanitation in bpf_insn_prepare_dump

This logic is not needed anymore since we got rid of the verifier
rewrite that was using prog->aux address in f6069b9aa993 ("bpf:
fix redirect to map under tail calls").

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: sync tools/include/uapi/linux/bpf.h
Song Liu [Wed, 12 Dec 2018 17:37:47 +0000 (09:37 -0800)]
bpf: sync tools/include/uapi/linux/bpf.h

Sync bpf.h for nr_prog_tags and prog_tags.

Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: include sub program tags in bpf_prog_info
Song Liu [Wed, 12 Dec 2018 17:37:46 +0000 (09:37 -0800)]
bpf: include sub program tags in bpf_prog_info

Changes v2 -> v3:
1. remove check for bpf_dump_raw_ok().

Changes v1 -> v2:
1. Fix error path as Martin suggested.

This patch adds nr_prog_tags and prog_tags to bpf_prog_info. This is a
reliable way for user space to get tags of all sub programs. Before this
patch, user space need to find sub program tags via kallsyms.

This feature will be used in BPF introspection, where user space queries
information about BPF programs via sys_bpf.

Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'bpf-fix-kptr-checks'
Daniel Borkmann [Thu, 13 Dec 2018 11:16:31 +0000 (12:16 +0100)]
Merge branch 'bpf-fix-kptr-checks'

Martin KaFai Lau says:

====================
This patch set removes the bpf_dump_raw_ok() guard for the func_info
and line_info during bpf_prog_get_info_by_fd().
====================

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: Remove !func_info and !line_info check from test_btf and bpftool
Martin KaFai Lau [Wed, 12 Dec 2018 18:18:22 +0000 (10:18 -0800)]
bpf: Remove !func_info and !line_info check from test_btf and bpftool

kernel can provide the func_info and line_info even
it fails the btf_dump_raw_ok() test because they don't contain
kernel address.  This patch removes the corresponding '== 0'
test.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: Remove bpf_dump_raw_ok() check for func_info and line_info
Martin KaFai Lau [Wed, 12 Dec 2018 18:18:21 +0000 (10:18 -0800)]
bpf: Remove bpf_dump_raw_ok() check for func_info and line_info

The func_info and line_info have the bpf insn offset but
they do not contain kernel address.  They will still be useful
for the userspace tool to annotate the xlated insn.

This patch removes the bpf_dump_raw_ok() guard for the
func_info and line_info during bpf_prog_get_info_by_fd().

The guard stays for jited_line_info which contains the kernel
address.

Although this bpf_dump_raw_ok() guard behavior has started since
the earlier func_info patch series, I marked the Fixes tag to the
latest line_info patch series which contains both func_info and
line_info and this patch is fixing for both of them.

Fixes: c454a46b5efd ("bpf: Add bpf_line_info support")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'bpf-bpftool-license-update'
Daniel Borkmann [Thu, 13 Dec 2018 11:08:45 +0000 (12:08 +0100)]
Merge branch 'bpf-bpftool-license-update'

Jakub Kicinski says:

====================
We are changing/clarifying the license on bpftool to GPLv2-only +
BSD-2-Clause for all files.  Current license mix is incompatible
with libbfd (which is GPLv3-only) and therefore Debian maintainers
are apprehensive about packaging bpftool.

Acks include authors of code which has been copied into bpftool (e.g.
JSON writer from iproute2, code from tools/bpf, code from BPF samples
and selftests, etc.)

Thanks again to all the authors who acked the change!
====================

Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Sean Young <sean@mess.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: David Calavera <david.calavera@gmail.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Joe Stringer <joe@wand.net.nz>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Acked-by: Petar Penkov <ppenkov@stanford.edu>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
CC: okash.khawaja@gmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: dual license all files
Jakub Kicinski [Thu, 13 Dec 2018 03:59:26 +0000 (19:59 -0800)]
tools: bpftool: dual license all files

Currently bpftool contains a mix of GPL-only and GPL or BSD2
licensed files.  Make sure all files are dual licensed under
GPLv2 and BSD-2-Clause.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Sean Young <sean@mess.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: David Calavera <david.calavera@gmail.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Joe Stringer <joe@wand.net.nz>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Acked-by: Petar Penkov <ppenkov@stanford.edu>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
CC: okash.khawaja@gmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: replace Netronome boilerplate with SPDX license headers
Jakub Kicinski [Thu, 13 Dec 2018 03:59:25 +0000 (19:59 -0800)]
tools: bpftool: replace Netronome boilerplate with SPDX license headers

Replace the repeated license text with SDPX identifiers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Sean Young <sean@mess.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: David Calavera <david.calavera@gmail.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Joe Stringer <joe@wand.net.nz>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Acked-by: Petar Penkov <ppenkov@stanford.edu>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
CC: okash.khawaja@gmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: fix SPDX format in headers
Jakub Kicinski [Thu, 13 Dec 2018 03:59:24 +0000 (19:59 -0800)]
tools: bpftool: fix SPDX format in headers

Documentation/process/license-rules.rst sayeth:

2. Style:

The SPDX license identifier is added in form of a comment.  The comment
style depends on the file type::

   C source: // SPDX-License-Identifier: <SPDX License Expression>
   C header: /* SPDX-License-Identifier: <SPDX License Expression> */

Headers should use C comment style.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Sean Young <sean@mess.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: David Calavera <david.calavera@gmail.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Joe Stringer <joe@wand.net.nz>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Acked-by: Petar Penkov <ppenkov@stanford.edu>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
CC: okash.khawaja@gmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests/bpf: add btf annotations for cgroup_local_storage maps
Roman Gushchin [Mon, 10 Dec 2018 23:43:02 +0000 (15:43 -0800)]
selftests/bpf: add btf annotations for cgroup_local_storage maps

Add btf annotations to cgroup local storage maps (per-cpu and shared)
in the network packet counting example.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: add bpffs pretty print for cgroup local storage maps
Roman Gushchin [Mon, 10 Dec 2018 23:43:01 +0000 (15:43 -0800)]
bpf: add bpffs pretty print for cgroup local storage maps

Implement bpffs pretty printing for cgroup local storage maps
(both shared and per-cpu).
Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c):

Shared:
  $ cat /sys/fs/bpf/map_2
  # WARNING!! The output is for debug purpose only
  # WARNING!! The output format will change
  {4294968594,1}: {9999,1039896}

Per-cpu:
  $ cat /sys/fs/bpf/map_1
  # WARNING!! The output is for debug purpose only
  # WARNING!! The output format will change
  {4294968594,1}: {
   cpu0: {0,0,0,0,0}
   cpu1: {0,0,0,0,0}
   cpu2: {1,104,0,0,0}
   cpu3: {0,0,0,0,0}
  }

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: pass struct btf pointer to the map_check_btf() callback
Roman Gushchin [Mon, 10 Dec 2018 23:43:00 +0000 (15:43 -0800)]
bpf: pass struct btf pointer to the map_check_btf() callback

If key_type or value_type are of non-trivial data types
(e.g. structure or typedef), it's not possible to check them without
the additional information, which can't be obtained without a pointer
to the btf structure.

So, let's pass btf pointer to the map_check_btf() callbacks.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoselftests/bpf: use __bpf_constant_htons in test_prog.c
Stanislav Fomichev [Wed, 12 Dec 2018 03:20:52 +0000 (19:20 -0800)]
selftests/bpf: use __bpf_constant_htons in test_prog.c

For some reason, my older GCC (< 4.8) isn't smart enough to optimize the
!__builtin_constant_p() branch in bpf_htons, I see:
  error: implicit declaration of function '__builtin_bswap16'

Let's use __bpf_constant_htons as suggested by Daniel Borkmann.

I tried to use simple htons, but it produces the following:
  test_progs.c:54:17: error: braced-group within expression allowed only
  inside a function
    .eth.h_proto = htons(ETH_P_IP),

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: arm64: Enable arm64 jit to provide bpf_line_info
Martin KaFai Lau [Wed, 12 Dec 2018 00:02:05 +0000 (16:02 -0800)]
bpf: arm64: Enable arm64 jit to provide bpf_line_info

This patch enables arm64's bpf_int_jit_compile() to provide
bpf_line_info by calling bpf_prog_fill_jited_linfo().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: fix up uapi helper description and sync bpf header with tools
Daniel Borkmann [Tue, 11 Dec 2018 09:26:33 +0000 (10:26 +0100)]
bpf: fix up uapi helper description and sync bpf header with tools

Minor markup fixup from bpf-next into net-next merge in the BPF helper
description of bpf_sk_lookup_tcp() and bpf_sk_lookup_udp(). Also sync
up the copy of bpf.h from tooling infrastructure.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Tue, 11 Dec 2018 02:00:43 +0000 (18:00 -0800)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-12-11

The following pull-request contains BPF updates for your *net-next* tree.

It has three minor merge conflicts, resolutions:

1) tools/testing/selftests/bpf/test_verifier.c

 Take first chunk with alignment_prevented_execution.

2) net/core/filter.c

  [...]
  case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
  case bpf_ctx_range(struct __sk_buff, wire_len):
        return false;
  [...]

3) include/uapi/linux/bpf.h

  Take the second chunk for the two cases each.

The main changes are:

1) Add support for BPF line info via BTF and extend libbpf as well
   as bpftool's program dump to annotate output with BPF C code to
   facilitate debugging and introspection, from Martin.

2) Add support for BPF_ALU | BPF_ARSH | BPF_{K,X} in interpreter
   and all JIT backends, from Jiong.

3) Improve BPF test coverage on archs with no efficient unaligned
   access by adding an "any alignment" flag to the BPF program load
   to forcefully disable verifier alignment checks, from David.

4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for
   proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz.

5) Extend tc BPF programs to use a new __sk_buff field called wire_len
   for more accurate accounting of packets going to wire, from Petar.

6) Improve bpftool to allow dumping the trace pipe from it and add
   several improvements in bash completion and map/prog dump,
   from Quentin.

7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for
   kernel addresses and add a dedicated BPF JIT backend allocator,
   from Ard.

8) Add a BPF helper function for IR remotes to report mouse movements,
   from Sean.

9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info
   member naming consistent with existing conventions, from Yonghong
   and Song.

10) Misc cleanups and improvements in allowing to pass interface name
    via cmdline for xdp1 BPF example, from Matteo.

11) Fix a potential segfault in BPF sample loader's kprobes handling,
    from Daniel T.

12) Fix SPDX license in libbpf's README.rst, from Andrey.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbor: gc_list changes should be protected by table lock
David Ahern [Mon, 10 Dec 2018 21:54:07 +0000 (13:54 -0800)]
neighbor: gc_list changes should be protected by table lock

Adding and removing neighbor entries to / from the gc_list need to be
done while holding the table lock; a couple of places were missed in the
original patch.

Move the list_add_tail in neigh_alloc to ___neigh_create where the lock
is already obtained. Since neighbor entries should rarely be moved
to/from PERMANENT state, add lock/unlock around the gc_list changes in
neigh_change_state rather than extending the lock hold around all
neighbor updates.

Fixes: 58956317c8de ("neighbor: Improve garbage collection")
Reported-by: Andrei Vagin <avagin@gmail.com>
Reported-by: syzbot+6cc2fd1d3bdd2e007363@syzkaller.appspotmail.com
Reported-by: syzbot+35e87b87c00f386b041f@syzkaller.appspotmail.com
Reported-by: syzbot+b354d1fb59091ea73c37@syzkaller.appspotmail.com
Reported-by: syzbot+3ddead5619658537909b@syzkaller.appspotmail.com
Reported-by: syzbot+424d47d5c456ce8b2bbe@syzkaller.appspotmail.com
Reported-by: syzbot+e4d42eb35f6a27b0a628@syzkaller.appspotmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5e-updates-2018-12-10' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Tue, 11 Dec 2018 01:06:58 +0000 (17:06 -0800)]
Merge tag 'mlx5e-updates-2018-12-10' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed:

====================
mlx5e-updates-2018-12-10 (gre)

This patch set adds GRE offloading support to Mellanox ethernet driver.

Patches 1-5 replace the existing egdev mechanism with the new TC indirect
block binds mechanism that was introduced by Netronome:
7f76fa36754b ("net: sched: register callbacks for indirect tc block binds")

Patches 6-9 add GRE offloading support along with some required
refactoring work.

Patch 10, Add netif_is_gretap()/netif_is_ip6gretap()
 - Changed the is_gretap_dev and is_ip6gretap_dev logic from structure
   comparison to string comparison of the rtnl_link_ops kind field.

Patch 11, add GRE offloading support to mlx5.

Patch 12 removes the egdev mechanism from TC as it is no longer used by
any of the drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/sched: Remove egdev mechanism
Oz Shlomo [Tue, 6 Nov 2018 07:58:37 +0000 (09:58 +0200)]
net/sched: Remove egdev mechanism

The egdev mechanism was replaced by the TC indirect block notifications
platform.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Cc: John Hurley <john.hurley@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Add GRE protocol offloading
Oz Shlomo [Mon, 29 Oct 2018 06:54:42 +0000 (08:54 +0200)]
net/mlx5e: Add GRE protocol offloading

Add HW offloading support for TC flower filters configured on
gretap/ip6gretap net devices.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet: Add netif_is_gretap()/netif_is_ip6gretap()
Oz Shlomo [Wed, 21 Nov 2018 10:15:34 +0000 (12:15 +0200)]
net: Add netif_is_gretap()/netif_is_ip6gretap()

Changed the is_gretap_dev and is_ip6gretap_dev logic from structure
comparison to string comparison of the rtnl_link_ops kind field.

This approach aligns with the current identification methods and function
names of vxlan and geneve network devices.

Convert mlxsw to use these helpers and use them in downstream mlx5 patch.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Move TC tunnel offloading code to separate source file
Oz Shlomo [Sun, 2 Dec 2018 12:43:27 +0000 (14:43 +0200)]
net/mlx5e: Move TC tunnel offloading code to separate source file

Move tunnel offloading related code to a separate source file for better
code maintainability.

Code refactoring with no functional change.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Branch according to classified tunnel type
Oz Shlomo [Wed, 14 Nov 2018 13:21:27 +0000 (15:21 +0200)]
net/mlx5e: Branch according to classified tunnel type

Currently the tunnel offloading encap/decap methods assumes that VXLAN
is the sole tunneling protocol. Lay the infrastructure for supporting
multiple tunneling protocols by branching according to the tunnel
net device kind.

Encap filters tunnel type is determined according to the egress/mirred
net device. Decap filters classify the tunnel type according to the
filter's ingress net device kind.

Distinguish between the tunnel type as defined by the SW model and
the FW reformat type that specifies the HW operation being made.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Refactor VXLAN tunnel decap offloading code
Oz Shlomo [Wed, 14 Nov 2018 13:41:50 +0000 (15:41 +0200)]
net/mlx5e: Refactor VXLAN tunnel decap offloading code

Separates the vxlan header match handling from the matching on the
general fields of ipv4/6 tunnels, thus allowing the common IP tunnel
match code to branch in down stream patch, to multiple IP tunnels.

This patch doesn't add any functionality.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Refactor VXLAN tunnel encap offloading code
Oz Shlomo [Wed, 14 Nov 2018 13:41:50 +0000 (15:41 +0200)]
net/mlx5e: Refactor VXLAN tunnel encap offloading code

Separates the vxlan header encap logic from the general ipv4/6
encapsulation methods, thus allowing the common IP encap/decap code to
branch in downstream patch to multiple IP tunnels.

Code refactoring with no functional change.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Replace egdev with indirect block notifications
Oz Shlomo [Sun, 28 Oct 2018 11:03:54 +0000 (13:03 +0200)]
net/mlx5e: Replace egdev with indirect block notifications

Use TC indirect block notifications to offload filters that
are configured on higher level device interfaces (e.g. tunnel
devices). This mechanism replaces the current egdev implementation.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Propagate the filter's net device to mlx5e structures
Oz Shlomo [Sun, 28 Oct 2018 08:46:34 +0000 (10:46 +0200)]
net/mlx5e: Propagate the filter's net device to mlx5e structures

Propagate the filter's net_device parameter to the tc flower parsed
attributes structure so that it can later be used in tunnel decap
offloading sequences.

Pre-step for replacing egdev logic with the indirect block
notification mechanism.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Provide the TC filter netdev as parameter to flower callbacks
Oz Shlomo [Sun, 28 Oct 2018 07:14:50 +0000 (09:14 +0200)]
net/mlx5e: Provide the TC filter netdev as parameter to flower callbacks

Currently the driver controls flower filters that are installed on its
devices. However, with the introduction of the indirect block
notifications platform the driver may receive control events for filters
that are installed on higher level net devices (e.g. tunnel devices).
Therefore, the driver filter control API will not be able to implicitly
assume the filter's net device.

Explicitly specify the filter's net device, no functional change

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Support TC indirect block notifications for eswitch uplink reprs
Oz Shlomo [Sun, 28 Oct 2018 06:34:51 +0000 (08:34 +0200)]
net/mlx5e: Support TC indirect block notifications for eswitch uplink reprs

Towards using this mechanism as the means to offload tunnel decap rules
set on SW tunnel devices instead of egdev, add the supporting structures
and functions.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Store eswitch uplink representor state on a dedicated struct
Oz Shlomo [Thu, 25 Oct 2018 18:51:11 +0000 (21:51 +0300)]
net/mlx5e: Store eswitch uplink representor state on a dedicated struct

Currently only a single field in the representor private structure
is relevant for uplink representors.  As a pre-step to allow adding
additional uplink representor fields, introduce uplink representor
private structure.

This is prepration step towards replacing egdev logic with the
indirect block notification mechanism. This patch doesn't change
any functionality.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Saeed Mahameed [Mon, 10 Dec 2018 23:43:47 +0000 (15:43 -0800)]
Merge branch 'mlx5-next' of git://git./linux/kernel/git/mellanox/linux

mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev
conflicts.

Highlights:

1) RDMA ODP  (On Demand Paging) improvements and moving ODP logic to
mlx5 RDMA driver
2) Improved mlx5 core driver and device events handling and provided API
for upper layers to subscribe to device events.
3) RDMA only code cleanup from mlx5 core
4) Add helper to get CQE opcode
5) Rework handling of port module events
6) shared mlx5_ifc.h updates to avoid conflicts

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoMerge branch 'rename-info_cnt-to-nr_info'
Alexei Starovoitov [Mon, 10 Dec 2018 22:51:45 +0000 (14:51 -0800)]
Merge branch 'rename-info_cnt-to-nr_info'

Yonghong Song says:

====================
Before func_info and line_info are added to the kernel, there are several
fields in structure bpf_prog_info specifying the "count" of a user buffer, e.g.,
        __u32 nr_jited_ksyms;
        __u32 nr_jited_func_lens;
The naming convention has the prefix "nr_".

The func_info and line_info support added several fields
        __u32 func_info_cnt;
        __u32 line_info_cnt;
        __u32 jited_line_info_cnt;
to indicate the "count" of buffers func_info, line_info and jited_line_info.
The original intention is to keep the field names the same as those in
structure bpf_attr, so it will be clear that the "count" returned to user
space will be the same as the one passed to the kernel during prog load.

Unfortunately, the field names *_info_cnt are not consistent with
other existing fields in bpf_prog_info.
This patch set renamed the fields *_info_cnt to nr_*_info
to keep naming convention consistent.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agotools/bpf: rename *_info_cnt to nr_*_info
Yonghong Song [Mon, 10 Dec 2018 22:14:10 +0000 (14:14 -0800)]
tools/bpf: rename *_info_cnt to nr_*_info

Rename all occurances of *_info_cnt field access
to nr_*_info in tools directory.

The local variables finfo_cnt, linfo_cnt and jited_linfo_cnt
in function do_dump() of tools/bpf/bpftool/prog.c are also
changed to nr_finfo, nr_linfo and nr_jited_linfo to
keep naming convention consistent.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agotools/bpf: sync kernel uapi bpf.h to tools directory
Yonghong Song [Mon, 10 Dec 2018 22:14:09 +0000 (14:14 -0800)]
tools/bpf: sync kernel uapi bpf.h to tools directory

Sync kernel uapi bpf.h "*_info_cnt => nr_*_info"
changes to tools directory.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: rename *_info_cnt to nr_*_info in bpf_prog_info
Yonghong Song [Mon, 10 Dec 2018 22:14:08 +0000 (14:14 -0800)]
bpf: rename *_info_cnt to nr_*_info in bpf_prog_info

In uapi bpf.h, currently we have the following fields in
the struct bpf_prog_info:
__u32 func_info_cnt;
__u32 line_info_cnt;
__u32 jited_line_info_cnt;
The above field names "func_info_cnt" and "line_info_cnt"
also appear in union bpf_attr for program loading.

The original intention is to keep the names the same
between bpf_prog_info and bpf_attr
so it will imply what we returned to user space will be
the same as what the user space passed to the kernel.

Such a naming convention in bpf_prog_info is not consistent
with other fields like:
        __u32 nr_jited_ksyms;
        __u32 nr_jited_func_lens;

This patch made this adjustment so in bpf_prog_info
newly introduced *_info_cnt becomes nr_*_info.

Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: clean up bpf_prog_get_info_by_fd()
Song Liu [Mon, 10 Dec 2018 19:17:50 +0000 (11:17 -0800)]
bpf: clean up bpf_prog_get_info_by_fd()

info.nr_jited_ksyms and info.nr_jited_func_lens cannot be 0 in these two
statements, so we don't need to check them.

Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agonet/mlx5: Remove the get protocol device interface entry
Or Gerlitz [Mon, 10 Dec 2018 21:15:17 +0000 (13:15 -0800)]
net/mlx5: Remove the get protocol device interface entry

This isn't used anywhere across the mlx5 driver stack,
remove it.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Support extended destination format in flow steering command
Eli Britstein [Mon, 10 Dec 2018 21:15:16 +0000 (13:15 -0800)]
net/mlx5: Support extended destination format in flow steering command

Update the flow steering command formatting according to the extended
destination API.
Note that the FW dictates that multi destination FTEs that involve at
least one encap must use the extended destination format, while single
destination ones must use the legacy format.
Using extended destination format requires FW support. Check for its
capabilities and return error if not supported.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Change vhca id valid bool field to bit flag
Eli Britstein [Mon, 10 Dec 2018 21:15:15 +0000 (13:15 -0800)]
net/mlx5: E-Switch, Change vhca id valid bool field to bit flag

Change the driver flow destination struct to use bit flags with the vhca
id valid being the 1st one. The flags field is more extendable and will
be used in downstream patch.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Introduce extended destination fields
Eli Britstein [Mon, 10 Dec 2018 21:15:14 +0000 (13:15 -0800)]
net/mlx5: Introduce extended destination fields

Extended destinations provide the ability to configure different
encapsulation properties per destination on a single FTE. This is
needed for use-cases such as remote mirroring over tunneled networks.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Revise gre and nvgre key formats
Oz Shlomo [Mon, 10 Dec 2018 21:15:13 +0000 (13:15 -0800)]
net/mlx5: Revise gre and nvgre key formats

GRE RFC defines a 32 bit key field. NVGRE RFC splits the 32 bit
key field to 24 bit VSID (gre_key_h) and 8 bit flow entropy (gre_key_l).

Define the two key parsing alternatives in a union, thus enabling both
access methods.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Add monitor commands layout and event data
Eyal Davidovich [Mon, 10 Dec 2018 21:15:12 +0000 (13:15 -0800)]
net/mlx5: Add monitor commands layout and event data

Will be used in downstream patch to monitor counter changes
by the HCA and report it to the driver by an event.
The driver will update its counters cached data accordingly.

Signed-off-by: Eyal Davidovich <eyald@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Add support for plugged-disabled cable status in PME
Mikhael Goikhman [Mon, 10 Dec 2018 21:15:11 +0000 (13:15 -0800)]
net/mlx5: Add support for plugged-disabled cable status in PME

Support a new hardware module status in port module events:
- module_status=0x4 (Cable plugged, but disabled)

Signed-off-by: Mikhael Goikhman <migo@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>