platform/kernel/linux-starfive.git
21 months agobpf/docs: Document the nocast aliasing behavior of ___init
David Vernet [Wed, 25 Jan 2023 14:38:16 +0000 (08:38 -0600)]
bpf/docs: Document the nocast aliasing behavior of ___init

When comparing BTF IDs for pointers being passed to kfunc arguments, the
verifier will allow pointer types that are equivalent according to the C
standard. For example, for:

struct bpf_cpumask {
cpumask_t cpumask;
refcount_t usage;
};

The verifier will allow a struct bpf_cpumask * to be passed to a kfunc
that takes a const struct cpumask * (cpumask_t is a typedef of struct
cpumask). The exception to this rule is if a type is suffixed with
___init, such as:

struct nf_conn___init {
struct nf_conn ct;
};

The verifier will _not_ allow a struct nf_conn___init * to be passed to
a kfunc that expects a struct nf_conn *. This patch documents this
behavior in the kfuncs documentation page.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-8-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf/docs: Document how nested trusted fields may be defined
David Vernet [Wed, 25 Jan 2023 14:38:15 +0000 (08:38 -0600)]
bpf/docs: Document how nested trusted fields may be defined

A prior change defined a new BTF_TYPE_SAFE_NESTED macro in the verifier
which allows developers to specify when a pointee field in a struct type
should inherit its parent pointer's trusted status. This patch updates
the kfuncs documentation to specify this macro and how it can be used.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-7-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf/docs: Document cpumask kfuncs in a new file
David Vernet [Wed, 25 Jan 2023 14:38:14 +0000 (08:38 -0600)]
bpf/docs: Document cpumask kfuncs in a new file

Now that we've added a series of new cpumask kfuncs, we should document
them so users can easily use them. This patch adds a new cpumasks.rst
file to document them.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-6-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoselftests/bpf: Add selftest suite for cpumask kfuncs
David Vernet [Wed, 25 Jan 2023 14:38:13 +0000 (08:38 -0600)]
selftests/bpf: Add selftest suite for cpumask kfuncs

A recent patch added a new set of kfuncs for allocating, freeing,
manipulating, and querying cpumasks. This patch adds a new 'cpumask'
selftest suite which verifies their behavior.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-5-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoselftests/bpf: Add nested trust selftests suite
David Vernet [Wed, 25 Jan 2023 14:38:12 +0000 (08:38 -0600)]
selftests/bpf: Add nested trust selftests suite

Now that defining trusted fields in a struct is supported, we should add
selftests to verify the behavior. This patch adds a few such testcases.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Enable cpumasks to be queried and used as kptrs
David Vernet [Wed, 25 Jan 2023 14:38:11 +0000 (08:38 -0600)]
bpf: Enable cpumasks to be queried and used as kptrs

Certain programs may wish to be able to query cpumasks. For example, if
a program that is tracing percpu operations wishes to track which tasks
end up running on which CPUs, it could be useful to associate that with
the tasks' cpumasks. Similarly, programs tracking NUMA allocations, CPU
scheduling domains, etc, could potentially benefit from being able to
see which CPUs a task could be migrated to.

This patch enables these types of use cases by introducing a series of
bpf_cpumask_* kfuncs. Amongst these kfuncs, there are two separate
"classes" of operations:

1. kfuncs which allow the caller to allocate and mutate their own
   cpumask kptrs in the form of a struct bpf_cpumask * object. Such
   kfuncs include e.g. bpf_cpumask_create() to allocate the cpumask, and
   bpf_cpumask_or() to mutate it. "Regular" cpumasks such as p->cpus_ptr
   may not be passed to these kfuncs, and the verifier will ensure this
   is the case by comparing BTF IDs.

2. Read-only operations which operate on const struct cpumask *
   arguments. For example, bpf_cpumask_test_cpu(), which tests whether a
   CPU is set in the cpumask. Any trusted struct cpumask * or struct
   bpf_cpumask * may be passed to these kfuncs. The verifier allows
   struct bpf_cpumask * even though the kfunc is defined with struct
   cpumask * because the first element of a struct bpf_cpumask is a
   cpumask_t, so it is safe to cast.

A follow-on patch will add selftests which validate these kfuncs, and
another will document them.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Disallow NULLable pointers for trusted kfuncs
David Vernet [Wed, 25 Jan 2023 14:38:10 +0000 (08:38 -0600)]
bpf: Disallow NULLable pointers for trusted kfuncs

KF_TRUSTED_ARGS kfuncs currently have a subtle and insidious bug in
validating pointers to scalars. Say that you have a kfunc like the
following, which takes an array as the first argument:

bool bpf_cpumask_empty(const struct cpumask *cpumask)
{
return cpumask_empty(cpumask);
}

...
BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_TRUSTED_ARGS)
...

If a BPF program were to invoke the kfunc with a NULL argument, it would
crash the kernel. The reason is that struct cpumask is defined as a
bitmap, which is itself defined as an array, and is accessed as a memory
address by bitmap operations. So when the verifier analyzes the
register, it interprets it as a pointer to a scalar struct, which is an
array of size 8. check_mem_reg() then sees that the register is NULL and
returns 0, and the kfunc crashes when it passes it down to the cpumask
wrappers.

To fix this, this patch adds a check for KF_ARG_PTR_TO_MEM which
verifies that the register doesn't contain a possibly-NULL pointer if
the kfunc is KF_TRUSTED_ARGS.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Allow trusted args to walk struct when checking BTF IDs
David Vernet [Fri, 20 Jan 2023 19:25:16 +0000 (13:25 -0600)]
bpf: Allow trusted args to walk struct when checking BTF IDs

When validating BTF types for KF_TRUSTED_ARGS kfuncs, the verifier
currently enforces that the top-level type must match when calling
the kfunc. In other words, the verifier does not allow the BPF program
to pass a bitwise equivalent struct, despite it being allowed according
to the C standard.

For example, if you have the following type:

struct  nf_conn___init {
struct nf_conn ct;
};

The C standard stipulates that it would be safe to pass a struct
nf_conn___init to a kfunc expecting a struct nf_conn. The verifier
currently disallows this, however, as semantically kfuncs may want to
enforce that structs that have equivalent types according to the C
standard, but have different BTF IDs, are not able to be passed to
kfuncs expecting one or the other. For example, struct nf_conn___init
may not be queried / looked up, as it is allocated but may not yet be
fully initialized.

On the other hand, being able to pass types that are equivalent
according to the C standard will be useful for other types of kfunc /
kptrs enabled by BPF.  For example, in a follow-on patch, a series of
kfuncs will be added which allow programs to do bitwise queries on
cpumasks that are either allocated by the program (in which case they'll
be a 'struct bpf_cpumask' type that wraps a cpumask_t as its first
element), or a cpumask that was allocated by the main kernel (in which
case it will just be a straight cpumask_t, as in task->cpus_ptr).

Having the two types of cpumasks allows us to distinguish between the
two for when a cpumask is read-only vs. mutatable. A struct bpf_cpumask
can be mutated by e.g. bpf_cpumask_clear(), whereas a regular cpumask_t
cannot be. On the other hand, a struct bpf_cpumask can of course be
queried in the exact same manner as a cpumask_t, with e.g.
bpf_cpumask_test_cpu().

If we were to enforce that top level types match, then a user that's
passing a struct bpf_cpumask to a read-only cpumask_t argument would
have to cast with something like bpf_cast_to_kern_ctx() (which itself
would need to be updated to expect the alias, and currently it only
accommodates a single alias per prog type). Additionally, not specifying
KF_TRUSTED_ARGS is not an option, as some kfuncs take one argument as a
struct bpf_cpumask *, and another as a struct cpumask *
(i.e. cpumask_t).

In order to enable this, this patch relaxes the constraint that a
KF_TRUSTED_ARGS kfunc must have strict type matching, and instead only
enforces strict type matching if a type is observed to be a "no-cast
alias" (i.e., that the type names are equivalent, but one is suffixed
with ___init).

Additionally, in order to try and be conservative and match existing
behavior / expectations, this patch also enforces strict type checking
for acquire kfuncs. We were already enforcing it for release kfuncs, so
this should also improve the consistency of the semantics for kfuncs.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230120192523.3650503-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Enable annotating trusted nested pointers
David Vernet [Fri, 20 Jan 2023 19:25:15 +0000 (13:25 -0600)]
bpf: Enable annotating trusted nested pointers

In kfuncs, a "trusted" pointer is a pointer that the kfunc can assume is
safe, and which the verifier will allow to be passed to a
KF_TRUSTED_ARGS kfunc. Currently, a KF_TRUSTED_ARGS kfunc disallows any
pointer to be passed at a nonzero offset, but sometimes this is in fact
safe if the "nested" pointer's lifetime is inherited from its parent.
For example, the const cpumask_t *cpus_ptr field in a struct task_struct
will remain valid until the task itself is destroyed, and thus would
also be safe to pass to a KF_TRUSTED_ARGS kfunc.

While it would be conceptually simple to enable this by using BTF tags,
gcc unfortunately does not yet support this. In the interim, this patch
enables support for this by using a type-naming convention. A new
BTF_TYPE_SAFE_NESTED macro is defined in verifier.c which allows a
developer to specify the nested fields of a type which are considered
trusted if its parent is also trusted. The verifier is also updated to
account for this. A patch with selftests will be added in a follow-on
change, along with documentation for this feature.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230120192523.3650503-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf, docs: Fix modulo zero, division by zero, overflow, and underflow
Dave Thaler [Tue, 24 Jan 2023 00:12:18 +0000 (00:12 +0000)]
bpf, docs: Fix modulo zero, division by zero, overflow, and underflow

Fix modulo zero, division by zero, overflow, and underflow. Also clarify how
a negative immediate value is used in unsigned division.

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230124001218.827-1-dthaler1968@googlemail.com
21 months agoMerge branch 'libbpf-extend-arguments-tracing'
Daniel Borkmann [Mon, 23 Jan 2023 19:53:02 +0000 (20:53 +0100)]
Merge branch 'libbpf-extend-arguments-tracing'

Andrii Nakryiko says:

====================
This patch set fixes and extends libbpf's bpf_tracing.h support for tracing
arguments of kprobes/uprobes, and syscall as a special case.

Depending on the architecture, anywhere between 3 and 8 arguments can be
passed to a function in registers (so relevant to kprobes and uprobes), but
before this patch set libbpf's macros in bpf_tracing.h only supported up to
5 arguments, which is limiting in practice. This patch set extends
bpf_tracing.h to support up to 8 arguments, if architecture allows. This
includes explicit PT_REGS_PARMx() macro family, as well as BPF_KPROBE() macro.

Now, with tracing syscall arguments situation is sometimes quite different.
For a lot of architectures syscall argument passing through registers differs
from function call sequence at least a little. For i386 it differs *a lot*.
This patch set addresses this issue across all currently supported
architectures and hopefully fixes existing issues. syscall(2) manpage defines
that either 6 or 7 arguments can be supported, depending on architecture, so
libbpf defines 6 or 7 registers per architecture to be used to fetch syscall
arguments.

Also, BPF_UPROBE and BPF_URETPROBE are introduced as part of this patch set.
They are aliases for BPF_KPROBE and BPF_KRETPROBE (as mechanics of argument
fetching of kernel functions and user-space functions are identical), but it
allows BPF users to have less confusing BPF-side code when working with
uprobes.

For both sets of changes selftests are extended to test these new register
definitions to architecture-defined limits. Unfortunately I don't have ability
to test it on all architectures, and BPF CI only tests 3 architecture (x86-64,
arm64, and s390x), so it would be greatly appreciated if people with access to
architectures other than above 3 helped review and test changes.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
21 months agolibbpf: Clean up now not needed __PT_PARM{1-6}_SYSCALL_REG defaults
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:14 +0000 (12:09 -0800)]
libbpf: Clean up now not needed __PT_PARM{1-6}_SYSCALL_REG defaults

Each architecture supports at least 6 syscall argument registers, so now
that specs for each architecture is defined in bpf_tracing.h, remove
unnecessary macro overrides, which previously were required to keep
existing BPF_KSYSCALL() uses compiling and working.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-26-andrii@kernel.org
21 months agoselftests/bpf: Add 6-argument syscall tracing test
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:13 +0000 (12:09 -0800)]
selftests/bpf: Add 6-argument syscall tracing test

Turns out splice() is one of the syscalls that's using current maximum
number of arguments (six). This is perfect for testing, so extend
bpf_syscall_macro selftest to also trace splice() syscall, using
BPF_KSYSCALL() macro. This makes sure all the syscall argument register
definitions are correct.

Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Link: https://lore.kernel.org/bpf/20230120200914.3008030-25-andrii@kernel.org
21 months agolibbpf: Define loongarch syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:12 +0000 (12:09 -0800)]
libbpf: Define loongarch syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-24-andrii@kernel.org
21 months agolibbpf: Define arc syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:11 +0000 (12:09 -0800)]
libbpf: Define arc syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-23-andrii@kernel.org
21 months agolibbpf: Define riscv syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:10 +0000 (12:09 -0800)]
libbpf: Define riscv syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com> # RISC-V
Link: https://lore.kernel.org/bpf/20230120200914.3008030-22-andrii@kernel.org
21 months agolibbpf: Define sparc syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:09 +0000 (12:09 -0800)]
libbpf: Define sparc syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-21-andrii@kernel.org
21 months agolibbpf: Define powerpc syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:08 +0000 (12:09 -0800)]
libbpf: Define powerpc syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.
Note that 7th arg is supported on 32-bit powerpc architecture, by not on
powerpc64.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-20-andrii@kernel.org
21 months agolibbpf: Define mips syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:07 +0000 (12:09 -0800)]
libbpf: Define mips syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-19-andrii@kernel.org
21 months agolibbpf: Define arm64 syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:06 +0000 (12:09 -0800)]
libbpf: Define arm64 syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.
We need PT_REGS_PARM1_[CORE_]SYSCALL macros overrides, similarly to
s390x, due to orig_x0 not being present in UAPI's pt_regs, so we need to
utilize BPF CO-RE and custom pt_regs___arm64 definition.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-18-andrii@kernel.org
21 months agolibbpf: Define arm syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:05 +0000 (12:09 -0800)]
libbpf: Define arm syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-17-andrii@kernel.org
21 months agolibbpf: Define s390x syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:04 +0000 (12:09 -0800)]
libbpf: Define s390x syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.
Note that we need custom overrides for PT_REGS_PARM1_[CORE_]SYSCALL
macros due to the need to use BPF CO-RE and custom local pt_regs
definitions to fetch orig_gpr2, storing 1st argument.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-16-andrii@kernel.org
21 months agolibbpf: Define i386 syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:03 +0000 (12:09 -0800)]
libbpf: Define i386 syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-15-andrii@kernel.org
21 months agolibbpf: Define x86-64 syscall regs spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:02 +0000 (12:09 -0800)]
libbpf: Define x86-64 syscall regs spec in bpf_tracing.h

Define explicit table of registers used for syscall argument passing.
Remove now unnecessary overrides of PT_REGS_PARM5_[CORE_]SYSCALL macros.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-14-andrii@kernel.org
21 months agolibbpf: Improve syscall tracing support in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:01 +0000 (12:09 -0800)]
libbpf: Improve syscall tracing support in bpf_tracing.h

Set up generic support in bpf_tracing.h for up to 7 syscall arguments
tracing with BPF_KSYSCALL, which seems to be the limit according to
syscall(2) manpage. Also change the way that syscall convention is
specified to be more explicit. Subsequent patches will adjust and define
proper per-architecture syscall conventions.

__PT_PARM1_SYSCALL_REG through __PT_PARM6_SYSCALL_REG is added
temporarily to keep everything working before each architecture has
syscall reg tables defined. They will be removed afterwards.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-13-andrii@kernel.org
21 months agoselftests/bpf: Validate arch-specific argument registers limits
Andrii Nakryiko [Fri, 20 Jan 2023 20:09:00 +0000 (12:09 -0800)]
selftests/bpf: Validate arch-specific argument registers limits

Update uprobe_autoattach selftest to validate architecture-specific
argument passing through registers. Use new BPF_UPROBE and
BPF_URETPROBE, and construct both BPF-side and user-space side in such
a way that for different architectures we are fetching and checking
different number of arguments, matching architecture-specific limit of
how many registers are available for argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Link: https://lore.kernel.org/bpf/20230120200914.3008030-12-andrii@kernel.org
21 months agolibbpf: Add BPF_UPROBE and BPF_URETPROBE macro aliases
Andrii Nakryiko [Fri, 20 Jan 2023 20:08:59 +0000 (12:08 -0800)]
libbpf: Add BPF_UPROBE and BPF_URETPROBE macro aliases

Add BPF_UPROBE and BPF_URETPROBE macros, aliased to BPF_KPROBE and
BPF_KRETPROBE, respectively. This makes uprobe-based BPF program code
much less confusing, especially to people new to tracing, at no cost in
terms of maintainability. We'll use this macro in selftests in
subsequent patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-11-andrii@kernel.org
21 months agolibbpf: Complete LoongArch (loongarch) spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:08:58 +0000 (12:08 -0800)]
libbpf: Complete LoongArch (loongarch) spec in bpf_tracing.h

Add PARM6 through PARM8 definitions. Add kernel docs link describing ABI
for LoongArch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-10-andrii@kernel.org
21 months agolibbpf: Fix and complete ARC spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:08:57 +0000 (12:08 -0800)]
libbpf: Fix and complete ARC spec in bpf_tracing.h

Add PARM6 through PARM8 definitions. Also fix frame pointer (FP)
register definition. Also leave a link to where to find ABI spec.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-9-andrii@kernel.org
21 months agolibbpf: Complete riscv arch spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:08:56 +0000 (12:08 -0800)]
libbpf: Complete riscv arch spec in bpf_tracing.h

Add PARM6 through PARM8 definitions for RISC V (riscv) arch. Leave the
link for ABI doc for future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com> # RISC-V
Link: https://lore.kernel.org/bpf/20230120200914.3008030-8-andrii@kernel.org
21 months agolibbpf: Complete sparc spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:08:55 +0000 (12:08 -0800)]
libbpf: Complete sparc spec in bpf_tracing.h

Add PARM6 definition for sparc architecture. Leave a link to calling
convention documentation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-7-andrii@kernel.org
21 months agolibbpf: Complete powerpc spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:08:54 +0000 (12:08 -0800)]
libbpf: Complete powerpc spec in bpf_tracing.h

Add definitions of PARM6 through PARM8 for powerpc architecture. Add
also a link to a functiona call sequence documentation for future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-6-andrii@kernel.org
21 months agolibbpf: Complete mips spec in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:08:53 +0000 (12:08 -0800)]
libbpf: Complete mips spec in bpf_tracing.h

Add registers for PARM6 through PARM8. Add a link to an ABI. We don't
distinguish between O32, N32, and N64, so document that we assume N64
right now.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-5-andrii@kernel.org
21 months agolibbpf: Fix arm and arm64 specs in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:08:52 +0000 (12:08 -0800)]
libbpf: Fix arm and arm64 specs in bpf_tracing.h

Remove invalid support for PARM5 on 32-bit arm, as per ABI. Add three
more argument registers for arm64. Also leave links to ABI specs for
future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-4-andrii@kernel.org
21 months agolibbpf: Add 6th argument support for x86-64 in bpf_tracing.h
Andrii Nakryiko [Fri, 20 Jan 2023 20:08:51 +0000 (12:08 -0800)]
libbpf: Add 6th argument support for x86-64 in bpf_tracing.h

Add r9 as register containing 6th argument on x86-64 architecture, as
per its ABI. Add also a link to a page describing ABI for easier future
reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-3-andrii@kernel.org
21 months agolibbpf: Add support for fetching up to 8 arguments in kprobes
Andrii Nakryiko [Fri, 20 Jan 2023 20:08:50 +0000 (12:08 -0800)]
libbpf: Add support for fetching up to 8 arguments in kprobes

Add BPF_KPROBE() and PT_REGS_PARMx() support for up to 8 arguments, if
target architecture supports this. Currently all architectures are
limited to only 5 register-placed arguments, which is limiting even on
x86-64.

This patch adds generic macro machinery to support up to 8 arguments
both when explicitly fetching it from pt_regs through PT_REGS_PARMx()
macros, as well as more ergonomic access in BPF_KPROBE().

Also, for i386 architecture we now don't have to define fake PARM4 and
PARM5 definitions, they will be generically substituted, just like for
PARM6 through PARM8.

Subsequent patches will fill out architecture-specific definitions,
where appropriate.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Link: https://lore.kernel.org/bpf/20230120200914.3008030-2-andrii@kernel.org
21 months agoselftests/bpf: Use __failure macro in task kfunc testsuite
David Vernet [Fri, 20 Jan 2023 02:18:44 +0000 (20:18 -0600)]
selftests/bpf: Use __failure macro in task kfunc testsuite

In commit 537c3f66eac1 ("selftests/bpf: add generic BPF program tester-loader"),
a new mechanism was added to the BPF selftest framework to allow testsuites to
use macros to define expected failing testcases.

This allows any testsuite which tests verification failure to remove a good
amount of boilerplate code. This patch updates the task_kfunc selftest suite
to use these new macros.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230120021844.3048244-1-void@manifault.com
21 months agoMerge branch 'xdp: hints via kfuncs'
Martin KaFai Lau [Mon, 23 Jan 2023 17:38:12 +0000 (09:38 -0800)]
Merge branch 'xdp: hints via kfuncs'

Stanislav Fomichev says:

====================

Please see the first patch in the series for the overall
design and use-cases.

See the following email from Toke for the per-packet metadata overhead:
https://lore.kernel.org/bpf/20221206024554.3826186-1-sdf@google.com/T/#m49d48ea08d525ec88360c7d14c4d34fb0e45e798

Recent changes:
- Keep new functions in en/xdp.c, do 'extern mlx5_xdp_metadata_ops' (Tariq)

- Remove mxbuf pointer and use xsk_buff_to_mxbuf (Tariq)

- Clarify xdp_buff vs 'XDP frame' (Jesper)

- Explicitly mention that AF_XDP RX descriptor lacks metadata size (Jesper)

- Drop libbpf_flags/xdp_flags from selftests and use ifindex instead
  of ifname (due to recent xsk.h refactoring)

Prior art (to record pros/cons for different approaches):

- Stable UAPI approach:
  https://lore.kernel.org/bpf/20220628194812.1453059-1-alexandr.lobakin@intel.com/
- Metadata+BTF_ID appoach:
  https://lore.kernel.org/bpf/166256538687.1434226.15760041133601409770.stgit@firesoul/
- v7:
  https://lore.kernel.org/bpf/20230112003230.3779451-1-sdf@google.com/
- v6:
  https://lore.kernel.org/bpf/20230104215949.529093-1-sdf@google.com/
- v5:
  https://lore.kernel.org/bpf/20221220222043.3348718-1-sdf@google.com/
- v4:
  https://lore.kernel.org/bpf/20221213023605.737383-1-sdf@google.com/
- v3:
  https://lore.kernel.org/bpf/20221206024554.3826186-1-sdf@google.com/
- v2:
  https://lore.kernel.org/bpf/20221121182552.2152891-1-sdf@google.com/
- v1:
  https://lore.kernel.org/bpf/20221115030210.3159213-1-sdf@google.com/
- kfuncs v2 RFC:
  https://lore.kernel.org/bpf/20221027200019.4106375-1-sdf@google.com/
- kfuncs v1 RFC:
  https://lore.kernel.org/bpf/20221104032532.1615099-1-sdf@google.com/

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Stanislav Fomichev (13):
  bpf: Document XDP RX metadata
  bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded
  bpf: Move offload initialization into late_initcall
  bpf: Reshuffle some parts of bpf/offload.c
  bpf: Introduce device-bound XDP programs
  selftests/bpf: Update expected test_offload.py messages
  bpf: XDP metadata RX kfuncs
  veth: Introduce veth_xdp_buff wrapper for xdp_buff
  veth: Support RX XDP metadata
  selftests/bpf: Verify xdp_metadata xdp->af_xdp path
  net/mlx4_en: Introduce wrapper for xdp_buff
  net/mlx4_en: Support RX XDP metadata
  selftests/bpf: Simple program to dump XDP RX metadata
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agoselftests/bpf: Simple program to dump XDP RX metadata
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:36 +0000 (14:15 -0800)]
selftests/bpf: Simple program to dump XDP RX metadata

To be used for verification of driver implementations. Note that
the skb path is gone from the series, but I'm still keeping the
implementation for any possible future work.

$ xdp_hw_metadata <ifname>

On the other machine:

$ echo -n xdp | nc -u -q1 <target> 9091 # for AF_XDP
$ echo -n skb | nc -u -q1 <target> 9092 # for skb

Sample output:

  # xdp
  xsk_ring_cons__peek: 1
  0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
  rx_timestamp_supported: 1
  rx_timestamp: 1667850075063948829
  0x19f9090: complete idx=8 addr=8000

  # skb
  found skb hwtstamp = 1668314052.854274681

Decoding:
  # xdp
  rx_timestamp=1667850075.063948829

  $ date -d @1667850075
  Mon Nov  7 11:41:15 AM PST 2022
  $ date
  Mon Nov  7 11:42:05 AM PST 2022

  # skb
  $ date -d @1668314052
  Sat Nov 12 08:34:12 PM PST 2022
  $ date
  Sat Nov 12 08:37:06 PM PST 2022

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-18-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agonet/mlx5e: Support RX XDP metadata
Toke Høiland-Jørgensen [Thu, 19 Jan 2023 22:15:35 +0000 (14:15 -0800)]
net/mlx5e: Support RX XDP metadata

Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe
pointer to the mlx5e_skb_from* functions so it can be retrieved from the
XDP ctx to do this.

Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-17-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agonet/mlx5e: Introduce wrapper for xdp_buff
Toke Høiland-Jørgensen [Thu, 19 Jan 2023 22:15:34 +0000 (14:15 -0800)]
net/mlx5e: Introduce wrapper for xdp_buff

Preparation for implementing HW metadata kfuncs. No functional change.

Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-16-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agoxsk: Add cb area to struct xdp_buff_xsk
Toke Høiland-Jørgensen [Thu, 19 Jan 2023 22:15:33 +0000 (14:15 -0800)]
xsk: Add cb area to struct xdp_buff_xsk

Add an area after the xdp_buff in struct xdp_buff_xsk that drivers can use
to stash extra information to use in metadata kfuncs. The maximum size of
24 bytes means the full xdp_buff_xsk structure will take up exactly two
cache lines (with the cb field spanning both). Also add a macro drivers can
use to check their own wrapping structs against the available size.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-15-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agonet/mlx4_en: Support RX XDP metadata
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:32 +0000 (14:15 -0800)]
net/mlx4_en: Support RX XDP metadata

RX timestamp and hash for now. Tested using the prog from the next
patch.

Also enabling xdp metadata support; don't see why it's disabled,
there is enough headroom..

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-14-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agonet/mlx4_en: Introduce wrapper for xdp_buff
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:31 +0000 (14:15 -0800)]
net/mlx4_en: Introduce wrapper for xdp_buff

No functional changes. Boilerplate to allow stuffing more data after xdp_buff.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-13-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agoselftests/bpf: Verify xdp_metadata xdp->af_xdp path
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:30 +0000 (14:15 -0800)]
selftests/bpf: Verify xdp_metadata xdp->af_xdp path

- create new netns
- create veth pair (veTX+veRX)
- setup AF_XDP socket for both interfaces
- attach bpf to veRX
- send packet via veTX
- verify the packet has expected metadata at veRX

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-12-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agoveth: Support RX XDP metadata
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:29 +0000 (14:15 -0800)]
veth: Support RX XDP metadata

The goal is to enable end-to-end testing of the metadata for AF_XDP.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-11-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agoveth: Introduce veth_xdp_buff wrapper for xdp_buff
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:28 +0000 (14:15 -0800)]
veth: Introduce veth_xdp_buff wrapper for xdp_buff

No functional changes. Boilerplate to allow stuffing more data after xdp_buff.

Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-10-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agobpf: Support consuming XDP HW metadata from fext programs
Toke Høiland-Jørgensen [Thu, 19 Jan 2023 22:15:27 +0000 (14:15 -0800)]
bpf: Support consuming XDP HW metadata from fext programs

Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP
programs that consume HW metadata, implement support for propagating the
offload information. The extension program doesn't need to set a flag or
ifindex, these will just be propagated from the target by the verifier.
We need to create a separate offload object for the extension program,
though, since it can be reattached to a different program later (which
means we can't just inherit the offload information from the target).

An additional check is added on attach that the new target is compatible
with the offload information in the extension prog.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-9-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agobpf: XDP metadata RX kfuncs
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:26 +0000 (14:15 -0800)]
bpf: XDP metadata RX kfuncs

Define a new kfunc set (xdp_metadata_kfunc_ids) which implements all possible
XDP metatada kfuncs. Not all devices have to implement them. If kfunc is not
supported by the target device, the default implementation is called instead.
The verifier, at load time, replaces a call to the generic kfunc with a call
to the per-device one. Per-device kfunc pointers are stored in separate
struct xdp_metadata_ops.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-8-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agoselftests/bpf: Update expected test_offload.py messages
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:25 +0000 (14:15 -0800)]
selftests/bpf: Update expected test_offload.py messages

Generic check has a different error message, update the selftest.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-7-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agobpf: Introduce device-bound XDP programs
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:24 +0000 (14:15 -0800)]
bpf: Introduce device-bound XDP programs

New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way
to associate a netdev with a BPF program at load time.

netdevsim checks are dropped in favor of generic check in dev_xdp_attach.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-6-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agobpf: Reshuffle some parts of bpf/offload.c
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:23 +0000 (14:15 -0800)]
bpf: Reshuffle some parts of bpf/offload.c

To avoid adding forward declarations in the main patch, shuffle
some code around. No functional changes.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-5-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agobpf: Move offload initialization into late_initcall
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:22 +0000 (14:15 -0800)]
bpf: Move offload initialization into late_initcall

So we don't have to initialize it manually from several paths.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-4-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agobpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:21 +0000 (14:15 -0800)]
bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded

BPF offloading infra will be reused to implement
bound-but-not-offloaded bpf programs. Rename existing
helpers for clarity. No functional changes.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-3-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agobpf: Document XDP RX metadata
Stanislav Fomichev [Thu, 19 Jan 2023 22:15:20 +0000 (14:15 -0800)]
bpf: Document XDP RX metadata

Document all current use-cases and assumptions.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-2-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agoMerge branch 'Dynptr fixes'
Alexei Starovoitov [Sat, 21 Jan 2023 01:55:04 +0000 (17:55 -0800)]
Merge branch 'Dynptr fixes'

Kumar Kartikeya Dwivedi says:

====================

This is part 2 of https://lore.kernel.org/bpf/20221018135920.726360-1-memxor@gmail.com.

Changelog:
----------
v4 -> v5
v5: https://lore.kernel.org/bpf/20230120070355.1983560-1-memxor@gmail.com

 * Add comments, tests from Joanne
 * Add Joanne's acks

v3 -> v4
v3: https://lore.kernel.org/bpf/20230120034314.1921848-1-memxor@gmail.com

 * Adopt BPF ASM tests to more readable style (Alexei)

v2 -> v3
v2: https://lore.kernel.org/bpf/20230119021442.1465269-1-memxor@gmail.com

 * Fix slice invalidation logic for unreferenced dynptrs (Joanne)
 * Add selftests for precise slice invalidation on destruction
 * Add Joanne's acks

v1 -> v2
v1: https://lore.kernel.org/bpf/20230101083403.332783-1-memxor@gmail.com

 * Return error early in case of overwriting referenced dynptr slots (Andrii, Joanne)
 * Rename destroy_stack_slots_dynptr to destroy_if_dynptr_stack_slot (Joanne)
 * Invalidate dynptr slices associated with dynptr in destroy_if_dynptr_stack_slot (Joanne)
 * Combine both dynptr_get_spi and is_spi_bounds_valid (Joanne)
 * Compute spi once in process_dynptr_func and pass it as parameter instead of recomputing (Joanne)
 * Add comments expanding REG_LIVE_WRITTEN marking in unmark_stack_slots_dynptr (Joanne)
 * Add comments explaining why destroy_if_dynptr_stack_slot call needs to be done for both spi
   and spi - 1 (Joanne)
 * Port BPF assembly tests from test_verifier to test_progs framework (Andrii)
 * Address misc feedback, rebase to bpf-next

Old v1 -> v1
Old v1: https://lore.kernel.org/bpf/20221018135920.726360-1-memxor@gmail.com

 * Allow overwriting dynptr stack slots from dynptr init helpers
 * Fix a bug in alignment check where reg->var_off.value was still not included
 * Address other minor nits

Eduard Zingerman (1):
  selftests/bpf: convenience macro for use with 'asm volatile' blocks
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoselftests/bpf: Add dynptr helper tests
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:41 +0000 (05:52 +0530)]
selftests/bpf: Add dynptr helper tests

First test that we allow overwriting dynptr slots and reinitializing
them in unreferenced case, and disallow overwriting for referenced case.
Include tests to ensure slices obtained from destroyed dynptrs are being
invalidated on their destruction. The destruction needs to be scoped, as
in slices of dynptr A should not be invalidated when dynptr B is
destroyed. Next, test that MEM_UNINIT doesn't allow writing dynptr stack
slots.

Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-13-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoselftests/bpf: Add dynptr partial slot overwrite tests
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:40 +0000 (05:52 +0530)]
selftests/bpf: Add dynptr partial slot overwrite tests

Try creating a dynptr, then overwriting second slot with first slot of
another dynptr. Then, the first slot of first dynptr should also be
invalidated, but without our fix that does not happen. As a consequence,
the unfixed case allows passing first dynptr (as the kernel check only
checks for slot_type and then first_slot == true).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-12-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoselftests/bpf: Add dynptr var_off tests
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:39 +0000 (05:52 +0530)]
selftests/bpf: Add dynptr var_off tests

Ensure that variable offset is handled correctly, and verifier takes
both fixed and variable part into account. Also ensures that only
constant var_off is allowed.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoselftests/bpf: Add dynptr pruning tests
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:38 +0000 (05:52 +0530)]
selftests/bpf: Add dynptr pruning tests

Add verifier tests that verify the new pruning behavior for STACK_DYNPTR
slots, and ensure that state equivalence takes into account changes to
the old and current verifier state correctly. Also ensure that the
stacksafe changes are actually enabling pruning in case states are
equivalent from pruning PoV.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-10-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoselftests/bpf: convenience macro for use with 'asm volatile' blocks
Eduard Zingerman [Sat, 21 Jan 2023 00:22:37 +0000 (05:52 +0530)]
selftests/bpf: convenience macro for use with 'asm volatile' blocks

A set of macros useful for writing naked BPF functions using inline
assembly. E.g. as follows:

struct map_struct {
...
} map SEC(".maps");

SEC(...)
__naked int foo_test(void)
{
asm volatile(
"r0 = 0;"
"*(u64*)(r10 - 8) = r0;"
"r1 = %[map] ll;"
"r2 = r10;"
"r2 += -8;"
"call %[bpf_map_lookup_elem];"
"r0 = 0;"
"exit;"
:
: __imm(bpf_map_lookup_elem),
  __imm_addr(map)
: __clobber_all);
}

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
[ Kartikeya: Add acks, include __clobber_common from Andrii ]
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-9-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Avoid recomputing spi in process_dynptr_func
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:36 +0000 (05:52 +0530)]
bpf: Avoid recomputing spi in process_dynptr_func

Currently, process_dynptr_func first calls dynptr_get_spi and then
is_dynptr_reg_valid_init and is_dynptr_reg_valid_uninit have to call it
again to obtain the spi value. Instead of doing this twice, reuse the
already obtained value (which is by default 0, and is only set for
PTR_TO_STACK, and only used in that case in aforementioned functions).
The input value for these two functions will either be -ERANGE or >= 1,
and can either be permitted or rejected based on the respective check.

Suggested-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Combine dynptr_get_spi and is_spi_bounds_valid
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:35 +0000 (05:52 +0530)]
bpf: Combine dynptr_get_spi and is_spi_bounds_valid

Currently, a check on spi resides in dynptr_get_spi, while others
checking its validity for being within the allocated stack slots happens
in is_spi_bounds_valid. Almost always barring a couple of cases (where
being beyond allocated stack slots is not an error as stack slots need
to be populated), both are used together to make checks. Hence, subsume
the is_spi_bounds_valid check in dynptr_get_spi, and return -ERANGE to
specially distinguish the case where spi is valid but not within
allocated slots in the stack state.

The is_spi_bounds_valid function is still kept around as it is a generic
helper that will be useful for other objects on stack similar to dynptr
in the future.

Suggested-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Allow reinitializing unreferenced dynptr stack slots
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:34 +0000 (05:52 +0530)]
bpf: Allow reinitializing unreferenced dynptr stack slots

Consider a program like below:

void prog(void)
{
{
struct bpf_dynptr ptr;
bpf_dynptr_from_mem(...);
}
...
{
struct bpf_dynptr ptr;
bpf_dynptr_from_mem(...);
}
}

Here, the C compiler based on lifetime rules in the C standard would be
well within in its rights to share stack storage for dynptr 'ptr' as
their lifetimes do not overlap in the two distinct scopes. Currently,
such an example would be rejected by the verifier, but this is too
strict. Instead, we should allow reinitializing over dynptr stack slots
and forget information about the old dynptr object.

The destroy_if_dynptr_stack_slot function already makes necessary checks
to avoid overwriting referenced dynptr slots. This is done to present a
better error message instead of forgetting dynptr information on stack
and preserving reference state, leading to an inevitable but
undecipherable error at the end about an unreleased reference which has
to be associated back to its allocating call instruction to make any
sense to the user.

Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Invalidate slices on destruction of dynptrs on stack
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:33 +0000 (05:52 +0530)]
bpf: Invalidate slices on destruction of dynptrs on stack

The previous commit implemented destroy_if_dynptr_stack_slot. It
destroys the dynptr which given spi belongs to, but still doesn't
invalidate the slices that belong to such a dynptr. While for the case
of referenced dynptr, we don't allow their overwrite and return an error
early, we still allow it and destroy the dynptr for unreferenced dynptr.

To be able to enable precise and scoped invalidation of dynptr slices in
this case, we must be able to associate the source dynptr of slices that
have been obtained using bpf_dynptr_data. When doing destruction, only
slices belonging to the dynptr being destructed should be invalidated,
and nothing else. Currently, dynptr slices belonging to different
dynptrs are indistinguishible.

Hence, allocate a unique id to each dynptr (CONST_PTR_TO_DYNPTR and
those on stack). This will be stored as part of reg->id. Whenever using
bpf_dynptr_data, transfer this unique dynptr id to the returned
PTR_TO_MEM_OR_NULL slice pointer, and store it in a new per-PTR_TO_MEM
dynptr_id register state member.

Finally, after establishing such a relationship between dynptrs and
their slices, implement precise invalidation logic that only invalidates
slices belong to the destroyed dynptr in destroy_if_dynptr_stack_slot.

Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Fix partial dynptr stack slot reads/writes
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:32 +0000 (05:52 +0530)]
bpf: Fix partial dynptr stack slot reads/writes

Currently, while reads are disallowed for dynptr stack slots, writes are
not. Reads don't work from both direct access and helpers, while writes
do work in both cases, but have the effect of overwriting the slot_type.

While this is fine, handling for a few edge cases is missing. Firstly,
a user can overwrite the stack slots of dynptr partially.

Consider the following layout:
spi: [d][d][?]
      2  1  0

First slot is at spi 2, second at spi 1.
Now, do a write of 1 to 8 bytes for spi 1.

This will essentially either write STACK_MISC for all slot_types or
STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write
of zeroes). The end result is that slot is scrubbed.

Now, the layout is:
spi: [d][m][?]
      2  1  0

Suppose if user initializes spi = 1 as dynptr.
We get:
spi: [d][d][d]
      2  1  0

But this time, both spi 2 and spi 1 have first_slot = true.

Now, when passing spi 2 to dynptr helper, it will consider it as
initialized as it does not check whether second slot has first_slot ==
false. And spi 1 should already work as normal.

This effectively replaced size + offset of first dynptr, hence allowing
invalid OOB reads and writes.

Make a few changes to protect against this:
When writing to PTR_TO_STACK using BPF insns, when we touch spi of a
STACK_DYNPTR type, mark both first and second slot (regardless of which
slot we touch) as STACK_INVALID. Reads are already prevented.

Second, prevent writing to stack memory from helpers if the range may
contain any STACK_DYNPTR slots. Reads are already prevented.

For helpers, we cannot allow it to destroy dynptrs from the writes as
depending on arguments, helper may take uninit_mem and dynptr both at
the same time. This would mean that helper may write to uninit_mem
before it reads the dynptr, which would be bad.

PTR_TO_MEM: [?????dd]

Depending on the code inside the helper, it may end up overwriting the
dynptr contents first and then read those as the dynptr argument.

Verifier would only simulate destruction when it does byte by byte
access simulation in check_helper_call for meta.access_size, and
fail to catch this case, as it happens after argument checks.

The same would need to be done for any other non-trivial objects created
on the stack in the future, such as bpf_list_head on stack, or
bpf_rb_root on stack.

A common misunderstanding in the current code is that MEM_UNINIT means
writes, but note that writes may also be performed even without
MEM_UNINIT in case of helpers, in that case the code after handling meta
&& meta->raw_mode will complain when it sees STACK_DYNPTR. So that
invalid read case also covers writes to potential STACK_DYNPTR slots.
The only loophole was in case of meta->raw_mode which simulated writes
through instructions which could overwrite them.

A future series sequenced after this will focus on the clean up of
helper access checks and bugs around that.

Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Fix missing var_off check for ARG_PTR_TO_DYNPTR
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:31 +0000 (05:52 +0530)]
bpf: Fix missing var_off check for ARG_PTR_TO_DYNPTR

Currently, the dynptr function is not checking the variable offset part
of PTR_TO_STACK that it needs to check. The fixed offset is considered
when computing the stack pointer index, but if the variable offset was
not a constant (such that it could not be accumulated in reg->off), we
will end up a discrepency where runtime pointer does not point to the
actual stack slot we mark as STACK_DYNPTR.

It is impossible to precisely track dynptr state when variable offset is
not constant, hence, just like bpf_timer, kptr, bpf_spin_lock, etc.
simply reject the case where reg->var_off is not constant. Then,
consider both reg->off and reg->var_off.value when computing the stack
pointer index.

A new helper dynptr_get_spi is introduced to hide over these details
since the dynptr needs to be located in multiple places outside the
process_dynptr_func checks, hence once we know it's a PTR_TO_STACK, we
need to enforce these checks in all places.

Note that it is disallowed for unprivileged users to have a non-constant
var_off, so this problem should only be possible to trigger from
programs having CAP_PERFMON. However, its effects can vary.

Without the fix, it is possible to replace the contents of the dynptr
arbitrarily by making verifier mark different stack slots than actual
location and then doing writes to the actual stack address of dynptr at
runtime.

Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs")
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Fix state pruning for STACK_DYNPTR stack slots
Kumar Kartikeya Dwivedi [Sat, 21 Jan 2023 00:22:30 +0000 (05:52 +0530)]
bpf: Fix state pruning for STACK_DYNPTR stack slots

The root of the problem is missing liveness marking for STACK_DYNPTR
slots. This leads to all kinds of problems inside stacksafe.

The verifier by default inside stacksafe ignores spilled_ptr in stack
slots which do not have REG_LIVE_READ marks. Since this is being checked
in the 'old' explored state, it must have already done clean_live_states
for this old bpf_func_state. Hence, it won't be receiving any more
liveness marks from to be explored insns (it has received REG_LIVE_DONE
marking from liveness point of view).

What this means is that verifier considers that it's safe to not compare
the stack slot if was never read by children states. While liveness
marks are usually propagated correctly following the parentage chain for
spilled registers (SCALAR_VALUE and PTR_* types), the same is not the
case for STACK_DYNPTR.

clean_live_states hence simply rewrites these stack slots to the type
STACK_INVALID since it sees no REG_LIVE_READ marks.

The end result is that we will never see STACK_DYNPTR slots in explored
state. Even if verifier was conservatively matching !REG_LIVE_READ
slots, very next check continuing the stacksafe loop on seeing
STACK_INVALID would again prevent further checks.

Now as long as verifier stores an explored state which we can compare to
when reaching a pruning point, we can abuse this bug to make verifier
prune search for obviously unsafe paths using STACK_DYNPTR slots
thinking they are never used hence safe.

Doing this in unprivileged mode is a bit challenging. add_new_state is
only set when seeing BPF_F_TEST_STATE_FREQ (which requires privileges)
or when jmps_processed difference is >= 2 and insn_processed difference
is >= 8. So coming up with the unprivileged case requires a little more
work, but it is still totally possible. The test case being discussed
below triggers the heuristic even in unprivileged mode.

However, it no longer works since commit
8addbfc7b308 ("bpf: Gate dynptr API behind CAP_BPF").

Let's try to study the test step by step.

Consider the following program (C style BPF ASM):

0  r0 = 0;
1  r6 = &ringbuf_map;
3  r1 = r6;
4  r2 = 8;
5  r3 = 0;
6  r4 = r10;
7  r4 -= -16;
8  call bpf_ringbuf_reserve_dynptr;
9  if r0 == 0 goto pc+1;
10 goto pc+1;
11 *(r10 - 16) = 0xeB9F;
12 r1 = r10;
13 r1 -= -16;
14 r2 = 0;
15 call bpf_ringbuf_discard_dynptr;
16 r0 = 0;
17 exit;

We know that insn 12 will be a pruning point, hence if we force
add_new_state for it, it will first verify the following path as
safe in straight line exploration:
0 1 3 4 5 6 7 8 9 -> 10 -> (12) 13 14 15 16 17

Then, when we arrive at insn 12 from the following path:
0 1 3 4 5 6 7 8 9 -> 11 (12)

We will find a state that has been verified as safe already at insn 12.
Since register state is same at this point, regsafe will pass. Next, in
stacksafe, for spi = 0 and spi = 1 (location of our dynptr) is skipped
seeing !REG_LIVE_READ. The rest matches, so stacksafe returns true.
Next, refsafe is also true as reference state is unchanged in both
states.

The states are considered equivalent and search is pruned.

Hence, we are able to construct a dynptr with arbitrary contents and use
the dynptr API to operate on this arbitrary pointer and arbitrary size +
offset.

To fix this, first define a mark_dynptr_read function that propagates
liveness marks whenever a valid initialized dynptr is accessed by dynptr
helpers. REG_LIVE_WRITTEN is marked whenever we initialize an
uninitialized dynptr. This is done in mark_stack_slots_dynptr. It allows
screening off mark_reg_read and not propagating marks upwards from that
point.

This ensures that we either set REG_LIVE_READ64 on both dynptr slots, or
none, so clean_live_states either sets both slots to STACK_INVALID or
none of them. This is the invariant the checks inside stacksafe rely on.

Next, do a complete comparison of both stack slots whenever they have
STACK_DYNPTR. Compare the dynptr type stored in the spilled_ptr, and
also whether both form the same first_slot. Only then is the later path
safe.

Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs")
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoMerge branch 'kallsyms: Optimize the search for module symbols by livepatch and bpf'
Alexei Starovoitov [Fri, 20 Jan 2023 01:07:15 +0000 (17:07 -0800)]
Merge branch 'kallsyms: Optimize the search for module symbols by livepatch and bpf'

Jiri Olsa says:

====================

hi,
sending new version of [1] patchset posted originally by Zhen Lei.
It contains 2 changes that improove search performance for livepatch
and bpf.

v3 changes:
  - fixed off by 1 issue, simplified condition, added acks [Song]
  - added module attach as subtest [Andrii]

v2 changes:
  - reworked the bpf change and meassured the performance
  - adding new selftest to benchmark kprobe multi module attachment
  - skipping patch 3 as requested by Zhen Lei
  - added Reviewed-by for patch 1 [Petr Mladek]

thanks,
jirka

[1] https://lore.kernel.org/bpf/20221230112729.351-1-thunder.leizhen@huawei.com/
---
Jiri Olsa (2):
      selftests/bpf: Add serial_test_kprobe_multi_bench_attach_kernel/module tests
      bpf: Change modules resolving for kprobe multi link
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Change modules resolving for kprobe multi link
Jiri Olsa [Mon, 16 Jan 2023 10:10:09 +0000 (11:10 +0100)]
bpf: Change modules resolving for kprobe multi link

We currently use module_kallsyms_on_each_symbol that iterates all
modules/symbols and we try to lookup each such address in user
provided symbols/addresses to get list of used modules.

This fix instead only iterates provided kprobe addresses and calls
__module_address on each to get list of used modules. This turned
out to be simpler and also bit faster.

On my setup with workload (executed 10 times):

   # test_progs -t kprobe_multi_bench_attach/modules

Current code:

 Performance counter stats for './test.sh' (5 runs):

    76,081,161,596      cycles:k                   ( +-  0.47% )

           18.3867 +- 0.0992 seconds time elapsed  ( +-  0.54% )

With the fix:

 Performance counter stats for './test.sh' (5 runs):

    74,079,889,063      cycles:k                   ( +-  0.04% )

           17.8514 +- 0.0218 seconds time elapsed  ( +-  0.12% )

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230116101009.23694-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoselftests/bpf: Add serial_test_kprobe_multi_bench_attach_kernel/module tests
Jiri Olsa [Mon, 16 Jan 2023 10:10:08 +0000 (11:10 +0100)]
selftests/bpf: Add serial_test_kprobe_multi_bench_attach_kernel/module tests

Add bench test for module portion of the symbols as well.

  # ./test_progs -v -t kprobe_multi_bench_attach_module
  bpf_testmod.ko is already unloaded.
  Loading bpf_testmod.ko...
  Successfully loaded bpf_testmod.ko.
  test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
  test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
  test_kprobe_multi_bench_attach:PASS:bpf_program__attach_kprobe_multi_opts 0 nsec
  test_kprobe_multi_bench_attach: found 26620 functions
  test_kprobe_multi_bench_attach: attached in   0.182s
  test_kprobe_multi_bench_attach: detached in   0.082s
  #96      kprobe_multi_bench_attach_module:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
  Successfully unloaded bpf_testmod.ko.

It's useful for testing kprobe multi link modules resolving.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230116101009.23694-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agolivepatch: Improve the search performance of module_kallsyms_on_each_symbol()
Zhen Lei [Mon, 16 Jan 2023 10:10:07 +0000 (11:10 +0100)]
livepatch: Improve the search performance of module_kallsyms_on_each_symbol()

Currently we traverse all symbols of all modules to find the specified
function for the specified module. But in reality, we just need to find
the given module and then traverse all the symbols in it.

Let's add a new parameter 'const char *modname' to function
module_kallsyms_on_each_symbol(), then we can compare the module names
directly in this function and call hook 'fn' after matching. If 'modname'
is NULL, the symbols of all modules are still traversed for compatibility
with other usage cases.

Phase1: mod1-->mod2..(subsequent modules do not need to be compared)
                |
Phase2:          -->f1-->f2-->f3

Assuming that there are m modules, each module has n symbols on average,
then the time complexity is reduced from O(m * n) to O(m) + O(n).

Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230116101009.23694-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoselftests/bpf: Fix build errors if CONFIG_NF_CONNTRACK=m
Tiezhu Yang [Wed, 18 Jan 2023 07:56:44 +0000 (15:56 +0800)]
selftests/bpf: Fix build errors if CONFIG_NF_CONNTRACK=m

If CONFIG_NF_CONNTRACK=m, there are no definitions of NF_NAT_MANIP_SRC
and NF_NAT_MANIP_DST in vmlinux.h, build test_bpf_nf.c failed.

$ make -C tools/testing/selftests/bpf/

  CLNG-BPF [test_maps] test_bpf_nf.bpf.o
progs/test_bpf_nf.c:160:42: error: use of undeclared identifier 'NF_NAT_MANIP_SRC'
                bpf_ct_set_nat_info(ct, &saddr, sport, NF_NAT_MANIP_SRC);
                                                       ^
progs/test_bpf_nf.c:163:42: error: use of undeclared identifier 'NF_NAT_MANIP_DST'
                bpf_ct_set_nat_info(ct, &daddr, dport, NF_NAT_MANIP_DST);
                                                       ^
2 errors generated.

Copy the definitions in include/net/netfilter/nf_nat.h to test_bpf_nf.c,
in order to avoid redefinitions if CONFIG_NF_CONNTRACK=y, rename them with
___local suffix. This is similar with commit 1058b6a78db2 ("selftests/bpf:
Do not fail build if CONFIG_NF_CONNTRACK=m/n").

Fixes: b06b45e82b59 ("selftests/bpf: add tests for bpf_ct_set_nat_info kfunc")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/1674028604-7113-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agobpf/selftests: Add verifier tests for loading sleepable programs
Jiri Olsa [Tue, 17 Jan 2023 22:37:05 +0000 (23:37 +0100)]
bpf/selftests: Add verifier tests for loading sleepable programs

Adding verifier tests for loading all types od allowed
sleepable programs plus reject for tp_btf type.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230117223705.440975-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobpf: Do not allow to load sleepable BPF_TRACE_RAW_TP program
Jiri Olsa [Tue, 17 Jan 2023 22:37:04 +0000 (23:37 +0100)]
bpf: Do not allow to load sleepable BPF_TRACE_RAW_TP program

Currently we allow to load any tracing program as sleepable,
but BPF_TRACE_RAW_TP can't sleep. Making the check explicit
for tracing programs attach types, so sleepable BPF_TRACE_RAW_TP
will fail to load.

Updating the verifier error to mention iter programs as well.

Acked-by: Song Liu <song@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230117223705.440975-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agobtf, scripts: Exclude Rust CUs with pahole
Martin Rodriguez Reboredo [Wed, 11 Jan 2023 15:20:50 +0000 (12:20 -0300)]
btf, scripts: Exclude Rust CUs with pahole

Version 1.24 of pahole has the capability to exclude compilation units (CUs)
of specific languages [1] [2]. Rust, as of writing, is not currently supported
by pahole and if it's used with a build that has BTF debugging enabled it
results in malformed kernel and module binaries [3]. So it's better for pahole
to exclude Rust CUs until support for it arrives.

Co-developed-by: Eric Curtin <ecurtin@redhat.com>
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Signed-off-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eric Curtin <ecurtin@redhat.com>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=49358dfe2aaae4e90b072332c3e324019826783f
Link: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=8ee363790b7437283c53090a85a9fec2f0b0fbc4
Link: https://github.com/Rust-for-Linux/linux/issues/735
Link: https://lore.kernel.org/bpf/20230111152050.559334-1-yakoyoku@gmail.com
21 months agoMerge branch 'samples/bpf: modernize BPF functionality test programs'
Alexei Starovoitov [Sun, 15 Jan 2023 21:32:46 +0000 (13:32 -0800)]
Merge branch 'samples/bpf: modernize BPF functionality test programs'

"Daniel T. Lee" says:

====================

Currently, there are many programs under samples/bpf to test the
various functionality of BPF that have been developed for a long time.
However, the kernel (BPF) has changed a lot compared to the 2016 when
some of these test programs were first introduced.

Therefore, some of these programs use the deprecated function of BPF,
and some programs no longer work normally due to changes in the API.

To list some of the kernel changes that this patch set is focusing on,
- legacy BPF map declaration syntax support had been dropped [1]
- bpf_trace_printk() always append newline at the end [2]
- deprecated styled BPF section header (bpf_load style) [3]
- urandom_read tracepoint is removed (used for testing overhead) [4]
- ping sends packet with SOCK_DGRAM instead of SOCK_RAW [5]*
- use "vmlinux.h" instead of including individual headers

In addition to this, this patchset tries to modernize the existing
testing scripts a bit. And for network-related testing programs,
a separate header file was created and applied. (To use the
Endianness conversion function from xdp_sample and bunch of constants)

[1]: https://github.com/libbpf/libbpf/issues/282
[2]: commit ac5a72ea5c89 ("bpf: Use dedicated bpf_trace_printk event instead of trace_printk()")
[3]: commit ceb5dea56543 ("samples: bpf: Remove bpf_load loader completely")
[4]: commit 14c174633f34 ("random: remove unused tracepoints")
[5]: https://lwn.net/Articles/422330/

*: This is quite old, but I'm not sure why the code was initially
   developed to filter only SOCK_RAW.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agosamples/bpf: change _kern suffix to .bpf with BPF test programs
Daniel T. Lee [Sun, 15 Jan 2023 07:16:13 +0000 (16:16 +0900)]
samples/bpf: change _kern suffix to .bpf with BPF test programs

This commit changes the _kern suffix to .bpf with the BPF test programs.
With this modification, test programs will inherit the benefit of the
new CLANG-BPF compile target.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230115071613.125791-11-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agosamples/bpf: use vmlinux.h instead of implicit headers in BPF test program
Daniel T. Lee [Sun, 15 Jan 2023 07:16:12 +0000 (16:16 +0900)]
samples/bpf: use vmlinux.h instead of implicit headers in BPF test program

This commit applies vmlinux.h to BPF functionality testing program.
Macros that were not defined despite migration to "vmlinux.h" were
defined separately in individual files.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230115071613.125791-10-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agosamples/bpf: replace BPF programs header with net_shared.h
Daniel T. Lee [Sun, 15 Jan 2023 07:16:11 +0000 (16:16 +0900)]
samples/bpf: replace BPF programs header with net_shared.h

This commit applies "net_shared.h" to BPF programs to remove existing
network related header dependencies. Also, this commit removes
unnecessary headers before applying "vmlinux.h" to the BPF programs.

Mostly, endianness conversion function has been applied to the source.
In addition, several macros have been defined to fulfill the INET,
TC-related constants.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230115071613.125791-9-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agosamples/bpf: split common macros to net_shared.h
Daniel T. Lee [Sun, 15 Jan 2023 07:16:10 +0000 (16:16 +0900)]
samples/bpf: split common macros to net_shared.h

Currently, many programs under sample/bpf often include individual
macros by directly including the header under "linux/" rather than
using the "vmlinux.h" header.

However, there are some problems with migrating to "vmlinux.h" because
there is no definition for utility functions such as endianness
conversion (ntohs/htons). Fortunately, the xdp_sample program already
has a function that can be replaced to solve this problem.

Therefore, this commit attempts to separate these functions into a file
called net_shared.h to make them universally available. Additionally,
this file includes network-related macros that are not defined in
"vmlinux.h". (inspired by 'selftests' bpf_tracing_net.h)

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230115071613.125791-8-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agosamples/bpf: replace legacy map with the BTF-defined map
Daniel T. Lee [Sun, 15 Jan 2023 07:16:09 +0000 (16:16 +0900)]
samples/bpf: replace legacy map with the BTF-defined map

With libbpf 1.0 release, support for legacy BPF map declaration syntax
had been dropped. If you run a program using legacy BPF in the latest
libbpf, the following error will be output.

    libbpf: map 'lwt_len_hist_map' (legacy): legacy map definitions are deprecated, use BTF-defined maps instead
    libbpf: Use of BPF_ANNOTATE_KV_PAIR is deprecated, use BTF-defined maps in .maps section instead

This commit replaces legacy map with the BTF-defined map.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230115071613.125791-7-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agosamples/bpf: replace broken overhead microbenchmark with fib_table_lookup
Daniel T. Lee [Sun, 15 Jan 2023 07:16:08 +0000 (16:16 +0900)]
samples/bpf: replace broken overhead microbenchmark with fib_table_lookup

The test_overhead bpf program is designed to compare performance
between tracepoint and kprobe. Initially it used task_rename and
urandom_read tracepoint.

However, commit 14c174633f34 ("random: remove unused tracepoints")
removed urandom_read tracepoint, and for this reason the test_overhead
got broken.

This commit introduces new microbenchmark using fib_table_lookup.
This microbenchmark sends UDP packets to localhost in order to invoke
fib_table_lookup.

In a nutshell:
fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
addr.sin_addr.s_addr = inet_addr(DUMMY_IP);
addr.sin_port = htons(DUMMY_PORT);
for() {
    sendto(fd, buf, strlen(buf), 0,
            (struct sockaddr *)&addr, sizeof(addr));
}

on 4 cpus in parallel:
                                            lookup per sec
base (no tracepoints, no kprobes)               381k
with kprobe at fib_table_lookup()               325k
with tracepoint at fib:fib_table_lookup         330k
with raw_tracepoint at fib:fib_table_lookup     365k

Fixes: 14c174633f34 ("random: remove unused tracepoints")

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230115071613.125791-6-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agosamples/bpf: fix broken cgroup socket testing
Daniel T. Lee [Sun, 15 Jan 2023 07:16:07 +0000 (16:16 +0900)]
samples/bpf: fix broken cgroup socket testing

Currently, executing test_cgrp2_sock2 fails due to wrong section
header. This 'cgroup/sock1' style section is previously used at
'samples/bpf_load' (deprecated) BPF loader. Because this style isn't
supported in libbpf, this commit fixes this problem by correcting the
section header.

    $ sudo ./test_cgrp2_sock2.sh
    libbpf: prog 'bpf_prog1': missing BPF prog type, check ELF section name 'cgroup/sock1'
    libbpf: prog 'bpf_prog1': failed to load: -22
    libbpf: failed to load object './sock_flags_kern.o'
    ERROR: loading BPF object file failed

In addition, this BPF program filters ping packets by comparing whether
the socket type uses SOCK_RAW. However, after the ICMP socket[1] was
developed, ping sends ICMP packets using SOCK_DGRAM. Therefore, in this
commit, the packet filtering is changed to use SOCK_DGRAM instead of
SOCK_RAW.

    $ strace --trace socket ping -6 -c1 -w1 ::1
    socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6) = 3

[1]: https://lwn.net/Articles/422330/

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230115071613.125791-5-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agosamples/bpf: fix broken lightweight tunnel testing
Daniel T. Lee [Sun, 15 Jan 2023 07:16:06 +0000 (16:16 +0900)]
samples/bpf: fix broken lightweight tunnel testing

The test_lwt_bpf is a script that tests the functionality of BPF through
the output of the ftrace with bpf_trace_printk. Currently, this program
is not operating normally for several reasons.

First of all, this test script can't parse the ftrace results properly.
GNU sed tries to be as greedy as possible when attempting pattern
matching. Due to this, cutting metadata (such as timestamp) from the
log entry of ftrace doesn't work properly, and also desired log isn't
extracted properly. To make sed stripping clearer, 'nocontext-info'
option with the ftrace has been used to remove metadata from the log.
Also, instead of using unclear pattern matching, this commit specifies
an explicit parse pattern.

Also, unlike before when this test was introduced, the way
bpf_trace_printk behaves has changed[1]. The previous bpf_trace_printk
had to always have '\n' in order to print newline, but now that the
bpf_trace_printk call includes newline by default, so '\n' is no longer
needed.

Lastly with the lwt ENCAP_BPF out, the context information with the
sk_buff protocol is preserved. Therefore, this commit changes the
previous test result from 'protocol 0' to 'protocol 8', which means
ETH_P_IP.

[1]: commit ac5a72ea5c89 ("bpf: Use dedicated bpf_trace_printk event instead of trace_printk()")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230115071613.125791-4-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agosamples/bpf: refactor BPF functionality testing scripts
Daniel T. Lee [Sun, 15 Jan 2023 07:16:05 +0000 (16:16 +0900)]
samples/bpf: refactor BPF functionality testing scripts

Currently, some test scripts are experiencing minor errors related to
executing tests.

    $ sudo ./test_cgrp2_sock.sh
    ./test_cgrp2_sock.sh: 22: test_cgrp2_sock: not found

This problem occurs because the path to the execution target is not
properly specified. Therefore, this commit solves this problem by
specifying a relative path to its executables. This commit also makes
a concise refactoring of hard-coded BPF program names.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230115071613.125791-3-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agosamples/bpf: ensure ipv6 is enabled before running tests
Daniel T. Lee [Sun, 15 Jan 2023 07:16:04 +0000 (16:16 +0900)]
samples/bpf: ensure ipv6 is enabled before running tests

Currently, a few of BPF tests use ipv6 functionality. The problem here
is that if ipv6 is disabled, these tests will fail, and even if the
test fails, it will not tell you why it failed.

    $ sudo ./test_cgrp2_sock2.sh
    RTNETLINK answers: Permission denied

In order to fix this, this commit ensures ipv6 is enabled prior to
running tests.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Link: https://lore.kernel.org/r/20230115071613.125791-2-danieltimlee@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
21 months agoMerge branch 'bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()'
Martin KaFai Lau [Sun, 15 Jan 2023 20:56:17 +0000 (12:56 -0800)]
Merge branch 'bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()'

Ziyang Xuan says:

====================

Add ipip6 and ip6ip decap support for bpf_skb_adjust_room().
Main use case is for using cls_bpf on ingress hook to decapsulate
IPv4 over IPv6 and IPv6 over IPv4 tunnel packets.

And add ipip6 and ip6ip decap testcases to verify that
bpf_skb_adjust_room() correctly decapsulate ipip6 and ip6ip
tunnel packets.

$./test_tc_tunnel.sh
ipip
encap 192.168.1.1 to 192.168.1.2, type ipip, mac none len 100
test basic connectivity
0
test bpf encap without decap (expect failure)
Ncat: TIMEOUT.
1
test bpf encap with tunnel device decap
0
test bpf encap with bpf decap
0
OK
ipip6
encap 192.168.1.1 to 192.168.1.2, type ipip6, mac none len 100
test basic connectivity
0
test bpf encap without decap (expect failure)
Ncat: TIMEOUT.
1
test bpf encap with tunnel device decap
0
test bpf encap with bpf decap
0
OK
ip6ip6
encap fd::1 to fd::2, type ip6tnl, mac none len 100
test basic connectivity
0
test bpf encap without decap (expect failure)
Ncat: TIMEOUT.
1
test bpf encap with tunnel device decap
0
test bpf encap with bpf decap
0
OK
sit
encap fd::1 to fd::2, type sit, mac none len 100
test basic connectivity
0
test bpf encap without decap (expect failure)
Ncat: TIMEOUT.
1
test bpf encap with tunnel device decap
0
test bpf encap with bpf decap
0
OK
...
OK. All tests passed

v3:
  - Fix compilation failure of selftests/bpf.
  - Combine two new branches in bpf_skb_adjust_room().
  - Simplify description for new flags BPF_F_ADJ_ROOM_DECAP_L3_IP*.

v2:
  - Use decap flags to indicate the new IP header.
    Do not rely on skb->encapsulation.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agoselftests/bpf: add ipip6 and ip6ip decap to test_tc_tunnel
Ziyang Xuan [Fri, 13 Jan 2023 09:25:10 +0000 (17:25 +0800)]
selftests/bpf: add ipip6 and ip6ip decap to test_tc_tunnel

Add ipip6 and ip6ip decap testcases. Verify that bpf_skb_adjust_room()
correctly decapsulate ipip6 and ip6ip tunnel packets.

Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/dfd2d8cfdf9111bd129170d4345296f53bee6a67.1673574419.git.william.xuanziyang@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
21 months agobpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()
Ziyang Xuan [Fri, 13 Jan 2023 09:24:51 +0000 (17:24 +0800)]
bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()

Add ipip6 and ip6ip decap support for bpf_skb_adjust_room().
Main use case is for using cls_bpf on ingress hook to decapsulate
IPv4 over IPv6 and IPv6 over IPv4 tunnel packets.

Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the
new IP header version after decapsulating the outer IP header.

Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
22 months agoselftests/bpf: Fix missing space error
Roberto Valenzuela [Fri, 13 Jan 2023 18:02:57 +0000 (13:02 -0500)]
selftests/bpf: Fix missing space error

Add the missing space after 'dest' variable assignment.
This change will resolve the following checkpatch.pl
script error:

ERROR: spaces required around that '+=' (ctx:VxW)

Signed-off-by: Roberto Valenzuela <valenzuelarober@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230113180257.39769-1-valenzuelarober@gmail.com
22 months agolibbpf: Replace '.' with '_' in legacy kprobe event name
Menglong Dong [Fri, 13 Jan 2023 09:34:27 +0000 (17:34 +0800)]
libbpf: Replace '.' with '_' in legacy kprobe event name

'.' is not allowed in the event name of kprobe. Therefore, we will get a
EINVAL if the kernel function name has a '.' in legacy kprobe attach
case, such as 'icmp_reply.constprop.0'.

In order to adapt this case, we need to replace the '.' with other char
in gen_kprobe_legacy_event_name(). And I use '_' for this propose.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20230113093427.1666466-1-imagedong@tencent.com
22 months agobpftool: Always disable stack protection for BPF objects
Holger Hoffstätte [Fri, 13 Jan 2023 15:40:23 +0000 (16:40 +0100)]
bpftool: Always disable stack protection for BPF objects

When the clang toolchain has stack protection enabled in order to be
consistent with gcc - which just happens to be the case on Gentoo -
the bpftool build fails:

  [...]
  clang \
-I. \
-I/tmp/portage/dev-util/bpftool-6.0.12/work/linux-6.0/tools/include/uapi/ \
-I/tmp/portage/dev-util/bpftool-6.0.12/work/linux-6.0/tools/bpf/bpftool/bootstrap/libbpf/include \
-g -O2 -Wall -target bpf -c skeleton/pid_iter.bpf.c -o pid_iter.bpf.o
  clang \
-I. \
-I/tmp/portage/dev-util/bpftool-6.0.12/work/linux-6.0/tools/include/uapi/ \
-I/tmp/portage/dev-util/bpftool-6.0.12/work/linux-6.0/tools/bpf/bpftool/bootstrap/libbpf/include \
-g -O2 -Wall -target bpf -c skeleton/profiler.bpf.c -o profiler.bpf.o
  skeleton/profiler.bpf.c:40:14: error: A call to built-in function '__stack_chk_fail' is not supported.
  int BPF_PROG(fentry_XXX)
                ^
  skeleton/profiler.bpf.c:94:14: error: A call to built-in function '__stack_chk_fail' is not supported.
  int BPF_PROG(fexit_XXX)
                ^
  2 errors generated.
  [...]

Since stack-protector makes no sense for the BPF bits just unconditionally
disable it.

Bug: https://bugs.gentoo.org/890638
Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/74cd9d2e-6052-312a-241e-2b514a75c92c@applied-asynchrony.com
22 months agoMerge branch 'selftests/xsk: speed-ups, fixes, and new XDP programs'
Alexei Starovoitov [Thu, 12 Jan 2023 02:16:53 +0000 (18:16 -0800)]
Merge branch 'selftests/xsk: speed-ups, fixes, and new XDP programs'

Magnus Karlsson says:

====================

This is a patch set of various performance improvements, fixes, and
the introduction of more than one XDP program to the xsk selftests
framework so we can test more things in the future such as upcoming
multi-buffer and metadata support for AF_XDP. The new programs just
reuse the framework that all the other eBPF selftests use. The new
feature is used to implement one new test that does XDP_DROP on every
other packet. More tests using this will be added in future commits.

Contents:

* The run-time of the test suite is cut by 10x when executing the
  tests on a real NIC, by only attaching the XDP program once per mode
  tested, instead of once per test program.

* Over 700 lines of code have been removed. The xsk.c control file was
  moved straight over from libbpf when the xsk support was deprecated
  there. As it is now not used as library code that has to work with
  all kinds of versions of Linux, a lot of code could be dropped or
  simplified.

* Add a new command line option "-d" that can be used when a test
  fails and you want to debug it with gdb or some other debugger. The
  option creates the two veth netdevs and prints them to the screen
  without deleting them afterwards. This way these veth netdevs can be
  used when running xskxceiver in a debugger.

* Implemented the possibility to load external XDP programs so we can
  have more than the default one. This feature is used to implement a
  test where every other packet is dropped. Good exercise for the
  recycling mechanism of the xsk buffer pool used in zero-copy mode.

* Various clean-ups and small fixes in patches 1 to 5. None of these
  fixes has any impact on the correct execution of the tests when they
  pass, though they can be irritating when a test fails. IMHO, they do
  not need to go to bpf as they will not fix anything there. The first
  version of patches 1, 2, and 4 where previously sent to bpf, but has
  now been included here.

v2 -> v3:
* Fixed compilation error for llvm [David]
* Made the function xsk_is_in_drv_mode(ifobj) more generic by changing
  it to xsk_is_in_mode(ifobj, xdp_mode) [Maciej]
* Added Maciej's acks to all the patches

v1 -> v2:
* Fixed spelling error in commit message of patch #6 [Björn]
* Added explanation on why it is safe to use C11 atomics in patch #7
  [Daniel]
* Put all XDP programs in the same file so that adding more XDP
  programs to xskxceiver.c becomes more scalable in patches #11 and
  #12 [Maciej]
* Removed more dead code in patch #8 [Maciej]
* Removed stale %s specifier in error print, patch #9 [Maciej]
* Changed name of XDP_CONSUMES_SOME_PACKETS to XDP_DROP_HALF to
  hopefully make it clearer [Maciej]
* ifobj_rx and ifobj_tx name changes in patch #13 [Maciej]
* Simplified XDP attachment code in patch #15 [Maciej]

Patches:
1-5:   Small fixes and clean-ups
6:     New convenient debug option when using a debugger such as gdb
7-8:   Removal of unnecessary code
9:     Add the ability to load external XDP programs
10-11: Removal of more unnecessary code
12:    Implement a new test where every other packet is XDP_DROP:ed
13:    Unify the thread dispatching code
14-15: Simplify the way tests are written when using custom packet_streams
       or custom XDP programs

Thanks: Magnus
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
22 months agoselftests/xsk: automatically switch XDP programs
Magnus Karlsson [Wed, 11 Jan 2023 09:35:26 +0000 (10:35 +0100)]
selftests/xsk: automatically switch XDP programs

Implement automatic switching of XDP programs and execution modes if
needed by a test. This makes it much simpler to write a test as it
only has to say what XDP program it needs if it is not the default
one. This also makes it possible to remove the initial explicit
attachment of the XDP program as well as the explicit mode switch in
the code. These are now handled by the same code that just checks if a
switch is necessary, so no special cases are needed.

The default XDP program for all tests is one that sends all packets to
the AF_XDP socket. If you need another one, please use the new
function test_spec_set_xdp_prog() to specify what XDP programs and
maps to use for this test.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-16-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
22 months agoselftests/xsk: automatically restore packet stream
Magnus Karlsson [Wed, 11 Jan 2023 09:35:25 +0000 (10:35 +0100)]
selftests/xsk: automatically restore packet stream

Automatically restore the default packet stream if needed at the end
of each test. This so that test writers do not forget to do this.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-15-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
22 months agoselftests/xsk: merge dual and single thread dispatchers
Magnus Karlsson [Wed, 11 Jan 2023 09:35:24 +0000 (10:35 +0100)]
selftests/xsk: merge dual and single thread dispatchers

Make the thread dispatching code common by unifying the dual and
single thread dispatcher code. This so we do not have to add code in
two places in upcoming commits.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-14-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
22 months agoselftests/xsk: add test when some packets are XDP_DROPed
Magnus Karlsson [Wed, 11 Jan 2023 09:35:23 +0000 (10:35 +0100)]
selftests/xsk: add test when some packets are XDP_DROPed

Add a new test where some of the packets are not passed to the AF_XDP
socket and instead get a XDP_DROP verdict. This is important as it
tests the recycling mechanism of the buffer pool. If a packet is not
sent to the AF_XDP socket, the buffer the packet resides in is instead
recycled so it can be used again without the round-trip to user
space. The test introduces a new XDP program that drops every other
packet.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-13-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
22 months agoselftests/xsk: get rid of built-in XDP program
Magnus Karlsson [Wed, 11 Jan 2023 09:35:22 +0000 (10:35 +0100)]
selftests/xsk: get rid of built-in XDP program

Get rid of the built-in XDP program that was part of the old libbpf
code in xsk.c and replace it with an eBPF program build using the
framework by all the other bpf selftests. This will form the base for
adding more programs in later commits.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-12-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
22 months agoselftests/xsk: remove unnecessary code in control path
Magnus Karlsson [Wed, 11 Jan 2023 09:35:21 +0000 (10:35 +0100)]
selftests/xsk: remove unnecessary code in control path

Remove unnecessary code in the control path. This is located in the
file xsk.c that was moved from libbpf when the xsk support there was
deprecated. Some of the code there is not needed anymore as the
selftests are only guaranteed to run on the kernel it is shipped
with. Therefore, all the code that has to deal with compatibility of
older kernels can be dropped and also any other function that is not
of any use for the tests.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-11-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>