review.tizen.org Git - platform/kernel/linux-starfive.git/log

projects / platform / kernel / linux-starfive.git / log

Martin KaFai Lau [Wed, 29 Mar 2023 20:10:56 +0000 (13:10 -0700)]

Merge branch 'Allow BPF TCP CCs to write app_limited'

Yixin Shen says:

====================

This series allow BPF TCP CCs to write app_limited of struct
tcp_sock. A built-in CC or one from a kernel module is already
able to write to app_limited of struct tcp_sock. Until now,
a BPF CC doesn't have write access to this member of struct
tcp_sock.

v2:
- Merge the test of writing app_limited into the test of
writing sk_pacing. (Martin KaFai Lau)
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Yixin Shen [Wed, 29 Mar 2023 07:35:58 +0000 (07:35 +0000)]

selftests/bpf: test a BPF CC writing app_limited

Test whether a TCP CC implemented in BPF is allowed to write
app_limited in struct tcp_sock. This is already allowed for
the built-in TCP CC.

Signed-off-by: Yixin Shen <bobankhshen@gmail.com>
Link: https://lore.kernel.org/r/20230329073558.8136-3-bobankhshen@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Yixin Shen [Wed, 29 Mar 2023 07:35:57 +0000 (07:35 +0000)]

bpf: allow a TCP CC to write app_limited

A CC that implements tcp_congestion_ops.cong_control() should be able to
write app_limited. A built-in CC or one from a kernel module is already
able to write to this member of struct tcp_sock.
For a BPF program, write access has not been allowed, yet.

Signed-off-by: Yixin Shen <bobankhshen@gmail.com>
Link: https://lore.kernel.org/r/20230329073558.8136-2-bobankhshen@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Manu Bretelle [Wed, 29 Mar 2023 07:30:02 +0000 (00:30 -0700)]

tools: bpftool: json: Fix backslash escape typo in jsonw_puts

This is essentially a backport of iproute2's
commit ed54f76484b5 ("json: fix backslash escape typo in jsonw_puts")

Also added the stdio.h include in json_writer.h to be able to compile
and run the json_writer test as used below).

Before this fix:

$ gcc -D notused -D TEST -I../../include -o json_writer  json_writer.c
json_writer.h
$ ./json_writer
{
    "Vyatta": {
        "url": "http://vyatta.com",
        "downloads": 2000000,
        "stock": 8.16,
        "ARGV": [],
        "empty": [],
        "NIL": {},
        "my_null": null,
        "special chars": [
            "slash": "/",
            "newline": "\n",
            "tab": "\t",
            "ff": "\f",
            "quote": "\"",
            "tick": "'",
            "backslash": "\n"
        ]
    }
}

After:

$ gcc -D notused -D TEST -I../../include -o json_writer  json_writer.c
json_writer.h
$ ./json_writer
{
    "Vyatta": {
        "url": "http://vyatta.com",
        "downloads": 2000000,
        "stock": 8.16,
        "ARGV": [],
        "empty": [],
        "NIL": {},
        "my_null": null,
        "special chars": [
            "slash": "/",
            "newline": "\n",
            "tab": "\t",
            "ff": "\f",
            "quote": "\"",
            "tick": "'",
            "backslash": "\\"
        ]
    }
}

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230329073002.2026563-1-chantr4@gmail.com

commit | commitdiff | tree

Andrii Nakryiko [Tue, 28 Mar 2023 21:48:27 +0000 (14:48 -0700)]

Merge branch 'verifier/xdp_direct_packet_access.c converted to inline assembly'

Eduard Zingerman says:

====================

verifier/xdp_direct_packet_access.c automatically converted to inline
assembly using [1].

This is a leftover from [2], the last patch in a batch was blocked by
mail server for being too long. This patch-set splits it in two:
- one to add migrated test to progs/
- one to remove old test from verifier/

[1] Migration tool
https://github.com/eddyz87/verifier-tests-migrator
[2] First batch of migrated verifier/*.c tests
https://lore.kernel.org/bpf/167979433109.17761.17302808621381963629.git-patchwork-notify@kernel.org/
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Tue, 28 Mar 2023 02:08:13 +0000 (05:08 +0300)]

selftests/bpf: Remove verifier/xdp_direct_packet_access.c, converted to progs/verifier_xdp_direct_packet_access.c

Removing verifier/xdp_direct_packet_access.c.c as it was automatically converted to use
inline assembly in the previous commit. It is available in
progs/verifier_xdp_direct_packet_access.c.c.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230328020813.392560-3-eddyz87@gmail.com

commit | commitdiff | tree

Eduard Zingerman [Tue, 28 Mar 2023 02:08:12 +0000 (05:08 +0300)]

selftests/bpf: Verifier/xdp_direct_packet_access.c converted to inline assembly

Test verifier/xdp_direct_packet_access.c automatically converted to use inline assembly.
Original test would be removed in the next patch.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230328020813.392560-2-eddyz87@gmail.com

commit | commitdiff | tree

Eduard Zingerman [Tue, 28 Mar 2023 00:47:38 +0000 (03:47 +0300)]

libbpf: Fix double-free when linker processes empty sections

Double-free error in bpf_linker__free() was reported by James Hilliard.
The error is caused by miss-use of realloc() in extend_sec().
The error occurs when two files with empty sections of the same name
are linked:
- when first file is processed:
  - extend_sec() calls realloc(dst->raw_data, dst_align_sz)
    with dst->raw_data == NULL and dst_align_sz == 0;
  - dst->raw_data is set to a special pointer to a memory block of
    size zero;
- when second file is processed:
  - extend_sec() calls realloc(dst->raw_data, dst_align_sz)
    with dst->raw_data == <special pointer> and dst_align_sz == 0;
  - realloc() "frees" dst->raw_data special pointer and returns NULL;
  - extend_sec() exits with -ENOMEM, and the old dst->raw_data value
    is preserved (it is now invalid);
  - eventually, bpf_linker__free() attempts to free dst->raw_data again.

This patch fixes the bug by avoiding -ENOMEM exit for dst_align_sz == 0.
The fix was suggested by Andrii Nakryiko <andrii.nakryiko@gmail.com>.

Reported-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: James Hilliard <james.hilliard1@gmail.com>
Link: https://lore.kernel.org/bpf/CADvTj4o7ZWUikKwNTwFq0O_AaX+46t_+Ca9gvWMYdWdRtTGeHQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230328004738.381898-3-eddyz87@gmail.com

commit | commitdiff | tree

Hengqi Chen [Sun, 26 Mar 2023 09:53:41 +0000 (09:53 +0000)]

selftests/bpf: Don't assume page size is 4096

The verifier test creates BPF ringbuf maps using hard-coded
4096 as max_entries. Some tests will fail if the page size
of the running kernel is not 4096. Use getpagesize() instead.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230326095341.816023-1-hengqi.chen@gmail.com

commit | commitdiff | tree

JP Kobryn [Sat, 25 Mar 2023 01:08:45 +0000 (18:08 -0700)]

libbpf: Ensure print callback usage is thread-safe

This patch prevents races on the print function pointer, allowing the
libbpf_set_print() function to become thread-safe.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230325010845.46000-1-inwardvessel@gmail.com

commit | commitdiff | tree

Nuno Gonçalves [Fri, 24 Mar 2023 10:02:22 +0000 (10:02 +0000)]

xsk: allow remap of fill and/or completion rings

The remap of fill and completion rings was frowned upon as they
control the usage of UMEM which does not support concurrent use.
At the same time this would disallow the remap of these rings
into another process.

A possible use case is that the user wants to transfer the socket/
UMEM ownership to another process (via SYS_pidfd_getfd) and so
would need to also remap these rings.

This will have no impact on current usages and just relaxes the
remap limitation.

Signed-off-by: Nuno Gonçalves <nunog@fr24.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230324100222.13434-1-nunog@fr24.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Dave Thaler [Sun, 26 Mar 2023 03:31:17 +0000 (03:31 +0000)]

bpf, docs: Add extended call instructions

Add extended call instructions. Uses the term "program-local" for
call by offset. And there are instructions for calling helper functions
by "address" (the old way of using integer values), and for calling
helper functions by BTF ID (for kfuncs).

V1 -> V2: addressed comments from David Vernet

V2 -> V3: make descriptions in table consistent with updated names

V3 -> V4: addressed comments from Alexei

V4 -> V5: fixed alignment

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230326033117.1075-1-dthaler1968@googlemail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Sun, 26 Mar 2023 02:52:52 +0000 (19:52 -0700)]

Merge branch 'bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage'

Martin KaFai Lau says:

====================

From: Martin KaFai Lau <martin.lau@kernel.org>

This set is a continuation of the effort in using
bpf_mem_cache_alloc/free in bpf_local_storage [1]

Major change is only using bpf_mem_alloc for task and cgrp storage
while sk and inode stay with kzalloc/kfree. The details is
in patch 2.

[1]: https://lore.kernel.org/bpf/20230308065936.1550103-1-martin.lau@linux.dev/

v3:
- Only use bpf_mem_alloc for task and cgrp storage.
- sk and inode storage stay with kzalloc/kfree.
- Check NULL and add comments in bpf_mem_cache_raw_free() in patch 1.
- Added test and benchmark for task storage.

v2:
- Added bpf_mem_cache_alloc_flags() and bpf_mem_cache_raw_free()
to hide the internal data structure of the bpf allocator.
- Fixed a typo bug in bpf_selem_free()
- Simplified the test_local_storage test by directly using
err returned from libbpf
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Wed, 22 Mar 2023 21:52:46 +0000 (14:52 -0700)]

selftests/bpf: Add bench for task storage creation

This patch adds a task storage benchmark to the existing
local-storage-create benchmark.

For task storage,
./bench --storage-type task --batch-size 32:
   bpf_ma: Summary: creates   30.456 ± 0.507k/s ( 30.456k/prod), 6.08 kmallocs/create
no bpf_ma: Summary: creates   31.962 ± 0.486k/s ( 31.962k/prod), 6.13 kmallocs/create

./bench --storage-type task --batch-size 64:
   bpf_ma: Summary: creates   30.197 ± 1.476k/s ( 30.197k/prod), 6.08 kmallocs/create
no bpf_ma: Summary: creates   31.103 ± 0.297k/s ( 31.103k/prod), 6.13 kmallocs/create

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230322215246.1675516-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Wed, 22 Mar 2023 21:52:45 +0000 (14:52 -0700)]

selftests/bpf: Test task storage when local_storage->smap is NULL

The current sk storage test ensures the memory free works when
the local_storage->smap is NULL.

This patch adds a task storage test to ensure the memory free
code path works when local_storage->smap is NULL.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230322215246.1675516-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Wed, 22 Mar 2023 21:52:44 +0000 (14:52 -0700)]

bpf: Use bpf_mem_cache_alloc/free for bpf_local_storage

This patch uses bpf_mem_cache_alloc/free for allocating and freeing
bpf_local_storage for task and cgroup storage.

The changes are similar to the previous patch. A few things that
worth to mention for bpf_local_storage:

The local_storage is freed when the last selem is deleted.
Before deleting a selem from local_storage, it needs to retrieve the
local_storage->smap because the bpf_selem_unlink_storage_nolock()
may have set it to NULL. Note that local_storage->smap may have
already been NULL when the selem created this local_storage has
been removed. In this case, call_rcu will be used to free the
local_storage.
Also, the bpf_ma (true or false) value is needed before calling
bpf_local_storage_free(). The bpf_ma can either be obtained from
the local_storage->smap (if available) or any of its selem's smap.
A new helper check_storage_bpf_ma() is added to obtain
bpf_ma for a deleting bpf_local_storage.

When bpf_local_storage_alloc getting a reused memory, all
fields are either in the correct values or will be initialized.
'cache[]' must already be all NULLs. 'list' must be empty.
Others will be initialized.

Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230322215246.1675516-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Wed, 22 Mar 2023 21:52:43 +0000 (14:52 -0700)]

bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem

This patch uses bpf_mem_alloc for the task and cgroup local storage that
the bpf prog can easily get a hold of the storage owner's PTR_TO_BTF_ID.
eg. bpf_get_current_task_btf() can be used in some of the kmalloc code
path which will cause deadlock/recursion. bpf_mem_cache_alloc is
deadlock free and will solve a legit use case in [1].

For sk storage, its batch creation benchmark shows a few percent
regression when the sk create/destroy batch size is larger than 32.
The sk creation/destruction happens much more often and
depends on external traffic. Considering it is hypothetical
to be able to cause deadlock with sk storage, it can cross
the bridge to use bpf_mem_alloc till a legit (ie. useful)
use case comes up.

For inode storage, bpf_local_storage_destroy() is called before
waiting for a rcu gp and its memory cannot be reused immediately.
inode stays with kmalloc/kfree after the rcu [or tasks_trace] gp.

A 'bool bpf_ma' argument is added to bpf_local_storage_map_alloc().
Only task and cgroup storage have 'bpf_ma == true' which
means to use bpf_mem_cache_alloc/free(). This patch only changes
selem to use bpf_mem_alloc for task and cgroup. The next patch
will change the local_storage to use bpf_mem_alloc also for
task and cgroup.

Here is some more details on the changes:

* memory allocation:
After bpf_mem_cache_alloc(), the SDATA(selem)->data is zero-ed because
bpf_mem_cache_alloc() could return a reused selem. It is to keep
the existing bpf_map_kzalloc() behavior. Only SDATA(selem)->data
is zero-ed. SDATA(selem)->data is the visible part to the bpf prog.
No need to use zero_map_value() to do the zeroing because
bpf_selem_free(..., reuse_now = true) ensures no bpf prog is using
the selem before returning the selem through bpf_mem_cache_free().
For the internal fields of selem, they will be initialized when
linking to the new smap and the new local_storage.

When 'bpf_ma == false', nothing changes in this patch. It will
stay with the bpf_map_kzalloc().

* memory free:
The bpf_selem_free() and bpf_selem_free_rcu() are modified to handle
the bpf_ma == true case.

For the common selem free path where its owner is also being destroyed,
the mem is freed in bpf_local_storage_destroy(), the owner (task
and cgroup) has gone through a rcu gp. The memory can be reused
immediately, so bpf_local_storage_destroy() will call
bpf_selem_free(..., reuse_now = true) which will do
bpf_mem_cache_free() for immediate reuse consideration.

An exception is the delete elem code path. The delete elem code path
is called from the helper bpf_*_storage_delete() and the syscall
bpf_map_delete_elem(). This path is an unusual case for local
storage because the common use case is to have the local storage
staying with its owner life time so that the bpf prog and the user
space does not have to monitor the owner's destruction. For the delete
elem path, the selem cannot be reused immediately because there could
be bpf prog using it. It will call bpf_selem_free(..., reuse_now = false)
and it will wait for a rcu tasks trace gp before freeing the elem. The
rcu callback is changed to do bpf_mem_cache_raw_free() instead of kfree().

When 'bpf_ma == false', it should be the same as before.
__bpf_selem_free() is added to do the kfree_rcu and call_tasks_trace_rcu().
A few words on the 'reuse_now == true'. When 'reuse_now == true',
it is still racing with bpf_local_storage_map_free which is under rcu
protection, so it still needs to wait for a rcu gp instead of kfree().
Otherwise, the selem may be reused by slab for a totally different struct
while the bpf_local_storage_map_free() is still using it (as a
rcu reader). For the inode case, there may be other rcu readers also.
In short, when bpf_ma == false and reuse_now == true => vanilla rcu.

[1]: https://lore.kernel.org/bpf/20221118190109.1512674-1-namhyung@kernel.org/

Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230322215246.1675516-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Wed, 22 Mar 2023 21:52:42 +0000 (14:52 -0700)]

bpf: Add a few bpf mem allocator functions

This patch adds a few bpf mem allocator functions which will
be used in the bpf_local_storage in a later patch.

bpf_mem_cache_alloc_flags(..., gfp_t flags) is added. When the
flags == GFP_KERNEL, it will fallback to __alloc(..., GFP_KERNEL).
bpf_local_storage knows its running context is sleepable (GFP_KERNEL)
and provides a better guarantee on memory allocation.

bpf_local_storage has some uncommon cases that its selem
cannot be reused immediately. It handles its own
rcu_head and goes through a rcu_trace gp and then free it.
bpf_mem_cache_raw_free() is added for direct free purpose
without leaking the LLIST_NODE_SZ internal knowledge.
During free time, the 'struct bpf_mem_alloc *ma' is no longer
available. However, the caller should know if it is
percpu memory or not and it can call different raw_free functions.
bpf_local_storage does not support percpu value, so only
the non-percpu 'bpf_mem_cache_raw_free()' is added in
this patch.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230322215246.1675516-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Sun, 26 Mar 2023 00:02:06 +0000 (17:02 -0700)]

Merge branch 'First set of verifier/*.c migrated to inline assembly'

Eduard Zingerman says:

====================

This is a follow up for RFC [1]. It migrates a first batch of 38
verifier/*.c tests to inline assembly and use of ./test_progs for
actual execution. The migration is done by a python script (see [2]).

Each migrated verifier/xxx.c file is mapped to progs/verifier_xxx.c
plus an entry in the prog_tests/verifier.c. One patch per each file.

A few patches at the beginning of the patch-set extend test_loader
with necessary functionality, mainly:
- support for tests execution in unprivileged mode;
- support for test runs for test programs.

Migrated tests could be selected for execution using the following filter:

  ./test_progs -a verifier_*

An example of the migrated test:

  SEC("xdp")
  __description("XDP pkt read, pkt_data' > pkt_end, corner case, good access")
  __success __retval(0) __flag(BPF_F_ANY_ALIGNMENT)
  __naked void end_corner_case_good_access_1(void)
  {
          asm volatile ("                                 \
          r2 = *(u32*)(r1 + %[xdp_md_data]);              \
          r3 = *(u32*)(r1 + %[xdp_md_data_end]);          \
          r1 = r2;                                        \
          r1 += 8;                                        \
          if r1 > r3 goto l0_%=;                          \
          r0 = *(u64*)(r1 - 8);                           \
  l0_%=:  r0 = 0;                                         \
          exit;                                           \
  "       :
          : __imm_const(xdp_md_data, offsetof(struct xdp_md, data)),
            __imm_const(xdp_md_data_end, offsetof(struct xdp_md, data_end))
          : __clobber_all);
  }

Changes compared to RFC:
- test_loader.c is extended to support test program runs;
- capabilities handling now matches behavior of test_verifier;
- BPF_ST_MEM instructions are automatically replaced by BPF_STX_MEM
  instructions to overcome current clang limitations;
- tests styling updates according to RFC feedback;
- 38 migrated files are included instead of 1.

I used the following means for testing:
- migration tool itself has a set of self-tests;
- migrated tests are passing;
- manually compared each old/new file side-by-side.

While doing side-by-side comparison I've noted a few defects in the
original tests:
- and.c:
  - One of the jump targets is off by one;
  - BPF_ST_MEM wrong OFF/IMM ordering;
- array_access.c:
  - BPF_ST_MEM wrong OFF/IMM ordering;
- value_or_null.c:
  - BPF_ST_MEM wrong OFF/IMM ordering.

These defects would be addressed separately.

[1] RFC
    https://lore.kernel.org/bpf/20230123145148.2791939-1-eddyz87@gmail.com/
[2] Migration tool
    https://github.com/eddyz87/verifier-tests-migrator
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:23 +0000 (04:55 +0200)]

selftests/bpf: verifier/xdp.c converted to inline assembly

Test verifier/xdp.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-43-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:22 +0000 (04:55 +0200)]

selftests/bpf: verifier/xadd.c converted to inline assembly

Test verifier/xadd.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-42-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:21 +0000 (04:55 +0200)]

selftests/bpf: verifier/var_off.c converted to inline assembly

Test verifier/var_off.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-41-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:20 +0000 (04:55 +0200)]

selftests/bpf: verifier/value_or_null.c converted to inline assembly

Test verifier/value_or_null.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-40-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:19 +0000 (04:55 +0200)]

selftests/bpf: verifier/value.c converted to inline assembly

Test verifier/value.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-39-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:18 +0000 (04:55 +0200)]

selftests/bpf: verifier/value_adj_spill.c converted to inline assembly

Test verifier/value_adj_spill.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-38-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:17 +0000 (04:55 +0200)]

selftests/bpf: verifier/uninit.c converted to inline assembly

Test verifier/uninit.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-37-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:16 +0000 (04:55 +0200)]

selftests/bpf: verifier/stack_ptr.c converted to inline assembly

Test verifier/stack_ptr.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-36-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:15 +0000 (04:55 +0200)]

selftests/bpf: verifier/spill_fill.c converted to inline assembly

Test verifier/spill_fill.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-35-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:14 +0000 (04:55 +0200)]

selftests/bpf: verifier/ringbuf.c converted to inline assembly

Test verifier/ringbuf.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-34-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:13 +0000 (04:55 +0200)]

selftests/bpf: verifier/raw_tp_writable.c converted to inline assembly

Test verifier/raw_tp_writable.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-33-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:12 +0000 (04:55 +0200)]

selftests/bpf: verifier/raw_stack.c converted to inline assembly

Test verifier/raw_stack.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-32-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:11 +0000 (04:55 +0200)]

selftests/bpf: verifier/meta_access.c converted to inline assembly

Test verifier/meta_access.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-31-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:10 +0000 (04:55 +0200)]

selftests/bpf: verifier/masking.c converted to inline assembly

Test verifier/masking.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-30-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:09 +0000 (04:55 +0200)]

selftests/bpf: verifier/map_ret_val.c converted to inline assembly

Test verifier/map_ret_val.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-29-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:08 +0000 (04:55 +0200)]

selftests/bpf: verifier/map_ptr.c converted to inline assembly

Test verifier/map_ptr.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-28-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:07 +0000 (04:55 +0200)]

selftests/bpf: verifier/leak_ptr.c converted to inline assembly

Test verifier/leak_ptr.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-27-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:06 +0000 (04:55 +0200)]

selftests/bpf: verifier/ld_ind.c converted to inline assembly

Test verifier/ld_ind.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-26-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:05 +0000 (04:55 +0200)]

selftests/bpf: verifier/int_ptr.c converted to inline assembly

Test verifier/int_ptr.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-25-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:04 +0000 (04:55 +0200)]

selftests/bpf: verifier/helper_value_access.c converted to inline assembly

Test verifier/helper_value_access.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-24-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:03 +0000 (04:55 +0200)]

selftests/bpf: verifier/helper_restricted.c converted to inline assembly

Test verifier/helper_restricted.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-23-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:02 +0000 (04:55 +0200)]

selftests/bpf: verifier/helper_packet_access.c converted to inline assembly

Test verifier/helper_packet_access.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-22-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:01 +0000 (04:55 +0200)]

selftests/bpf: verifier/helper_access_var_len.c converted to inline assembly

Test verifier/helper_access_var_len.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-21-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:55:00 +0000 (04:55 +0200)]

selftests/bpf: verifier/div_overflow.c converted to inline assembly

Test verifier/div_overflow.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-20-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:59 +0000 (04:54 +0200)]

selftests/bpf: verifier/div0.c converted to inline assembly

Test verifier/div0.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-19-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:58 +0000 (04:54 +0200)]

selftests/bpf: verifier/direct_stack_access_wraparound.c converted to inline assembly

Test verifier/direct_stack_access_wraparound.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-18-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:57 +0000 (04:54 +0200)]

selftests/bpf: verifier/ctx_sk_msg.c converted to inline assembly

Test verifier/ctx_sk_msg.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-17-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:56 +0000 (04:54 +0200)]

selftests/bpf: verifier/const_or.c converted to inline assembly

Test verifier/const_or.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-16-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:55 +0000 (04:54 +0200)]

selftests/bpf: verifier/cgroup_storage.c converted to inline assembly

Test verifier/cgroup_storage.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-15-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:54 +0000 (04:54 +0200)]

selftests/bpf: verifier/cgroup_skb.c converted to inline assembly

Test verifier/cgroup_skb.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-14-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:53 +0000 (04:54 +0200)]

selftests/bpf: verifier/cgroup_inv_retcode.c converted to inline assembly

Test verifier/cgroup_inv_retcode.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-13-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:52 +0000 (04:54 +0200)]

selftests/bpf: verifier/cfg.c converted to inline assembly

Test verifier/cfg.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-12-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:51 +0000 (04:54 +0200)]

selftests/bpf: verifier/bounds_mix_sign_unsign.c converted to inline assembly

Test verifier/bounds_mix_sign_unsign.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-11-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:50 +0000 (04:54 +0200)]

selftests/bpf: verifier/bounds_deduction.c converted to inline assembly

Test verifier/bounds_deduction.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-10-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:49 +0000 (04:54 +0200)]

selftests/bpf: verifier/basic_stack.c converted to inline assembly

Test verifier/basic_stack.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-9-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:48 +0000 (04:54 +0200)]

selftests/bpf: verifier/array_access.c converted to inline assembly

Test verifier/array_access.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:47 +0000 (04:54 +0200)]

selftests/bpf: verifier/and.c converted to inline assembly

Test verifier/and.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:46 +0000 (04:54 +0200)]

selftests/bpf: prog_tests entry point for migrated test_verifier tests

prog_tests/verifier.c would be used as a host for verifier/*.c tests
migrated to use inline assembly and run from test_progs.

The run_test_aux() function mimics the test_verifier behavior
dropping CAP_SYS_ADMIN upon entry.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:45 +0000 (04:54 +0200)]

selftests/bpf: Tests execution support for test_loader.c

Extends test_loader.c:test_loader__run_subtests() by allowing to
execute BPF_PROG_TEST_RUN bpf command for selected programs.
This is similar to functionality provided by test_verifier.

Adds the following new attributes controlling test_loader behavior:

  __retval(...)
  __retval_unpriv(...)

* If any of these attributes is present, the annotated program would
  be executed using libbpf's bpf_prog_test_run_opts() function.
* If __retval is present, the test run would be done for program
  loaded in privileged mode.
* If __retval_unpriv is present, the test run would be done for
  program loaded in unprivileged mode.
* To mimic test_verifier behavior, the actual run is initiated in
  privileged mode.
* The value returned by a test run is compared against retval
  parameter.

The retval attribute takes one of the following parameters:
- a decimal number
- a hexadecimal number (must start from '0x')
- any of a three special literals (provided for compatibility with
  test_verifier):
  - INT_MIN
  - POINTER_VALUE
  - TEST_DATA_LEN

An example of the attribute usage:

  SEC("socket")
  __description("return 42")
  __success __success_unpriv __retval(42)
  __naked void the_42_test(void)
  {
          asm volatile ("                                 \
          r0 = 42;                                        \
          exit;                                           \
  "       ::: __clobber_all);
  }

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:44 +0000 (04:54 +0200)]

selftests/bpf: Unprivileged tests for test_loader.c

Extends test_loader.c:test_loader__run_subtests() by allowing to
execute tests in unprivileged mode, similar to test_verifier.c.

Adds the following new attributes controlling test_loader behavior:

  __msg_unpriv
  __success_unpriv
  __failure_unpriv

* If any of these attributes is present the test would be loaded in
  unprivileged mode.
* If only "privileged" attributes are present the test would be loaded
  only in privileged mode.
* If both "privileged" and "unprivileged" attributes are present the
  test would be loaded in both modes.
* If test has to be executed in both modes, __msg(text) is specified
  and __msg_unpriv is not specified the behavior is the same as if
  __msg_unpriv(text) is specified.
* For test filtering purposes the name of the program loaded in
  unprivileged mode is derived from the usual program name by adding
  `@unpriv' suffix.

Also adds attribute '__description'. This attribute specifies text to
be used instead of a program name for display and filtering purposes.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:43 +0000 (04:54 +0200)]

selftests/bpf: __imm_insn & __imm_const macro for bpf_misc.h

Add two convenience macro for BPF test cases,
allowing the following usage:

  #include <linux/filter.h>

  ...
  asm volatile (
  ...
  ".8byte %[raw_insn];"
  ...
  "r1 += %[st_foo_offset];"
  ...
  :
  : __imm_insn(raw_insn, BPF_RAW_INSN(...)),
    __imm_const(st_foo_offset, offsetof(struct st, foo))
  : __clobber_all);

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 25 Mar 2023 02:54:42 +0000 (04:54 +0200)]

selftests/bpf: Report program name on parse_test_spec error

Change test_loader.c:run_subtest() behavior to show BPF program name
when test spec for that program can't be parsed.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Sat, 25 Mar 2023 23:56:22 +0000 (16:56 -0700)]

Merge branch 'Don't invoke KPTR_REF destructor on NULL xchg'

David Vernet says:

====================

When a map value is being freed, we loop over all of the fields of the
corresponding BPF object and issue the appropriate cleanup calls
corresponding to the field's type. If the field is a referenced kptr, we
atomically xchg the value out of the map, and invoke the kptr's
destructor on whatever was there before.

Currently, we always invoke the destructor (or bpf_obj_drop() for a
local kptr) on any kptr, including if no value was xchg'd out of the
map. This means that any function serving as the kptr's KF_RELEASE
destructor must always treat the argument as possibly NULL, and we
invoke unnecessary (and seemingly unsafe) cleanup logic for the local
kptr path as well.

This is an odd requirement -- KF_RELEASE kfuncs that are invoked by BPF
programs do not have this restriction, and the verifier will fail to
load the program if the register containing the to-be-released type has
any untrusted modifiers (e.g. PTR_UNTRUSTED or PTR_MAYBE_NULL). So as to
simplify the expectations required for a KF_RELEASE kfunc, this patch
set updates the KPTR_REF destructor logic to only be invoked when a
non-NULL value is xchg'd out of the map.

Additionally, the patch removes now-unnecessary KF_RELEASE calls from
several kfuncs, and finally, updates the verifier to have KF_RELEASE
automatically imply KF_TRUSTED_ARGS. This restriction was already
implicitly happening because of the aforementioned logic in the verifier
to reject any regs with untrusted modifiers, and to enforce that
KF_RELEASE args are passed with a 0 offset. This change just updates the
behavior to match that of other trusted args. This patch is left to the
end of the series in case it happens to be controversial, as it arguably
is slightly orthogonal to the purpose of the rest of the series.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

David Vernet [Sat, 25 Mar 2023 21:31:46 +0000 (16:31 -0500)]

bpf: Treat KF_RELEASE kfuncs as KF_TRUSTED_ARGS

KF_RELEASE kfuncs are not currently treated as having KF_TRUSTED_ARGS,
even though they have a superset of the requirements of KF_TRUSTED_ARGS.
Like KF_TRUSTED_ARGS, KF_RELEASE kfuncs require a 0-offset argument, and
don't allow NULL-able arguments. Unlike KF_TRUSTED_ARGS which require
_either_ an argument with ref_obj_id > 0, _or_ (ref->type &
BPF_REG_TRUSTED_MODIFIERS) (and no unsafe modifiers allowed), KF_RELEASE
only allows for ref_obj_id > 0. Because KF_RELEASE today doesn't
automatically imply KF_TRUSTED_ARGS, some of these requirements are
enforced in different ways that can make the behavior of the verifier
feel unpredictable. For example, a KF_RELEASE kfunc with a NULL-able
argument will currently fail in the verifier with a message like, "arg#0
is ptr_or_null_ expected ptr_ or socket" rather than "Possibly NULL
pointer passed to trusted arg0". Our intention is the same, but the
semantics are different due to implemenetation details that kfunc authors
and BPF program writers should not need to care about.

Let's make the behavior of the verifier more consistent and intuitive by
having KF_RELEASE kfuncs imply the presence of KF_TRUSTED_ARGS. Our
eventual goal is to have all kfuncs assume KF_TRUSTED_ARGS by default
anyways, so this takes us a step in that direction.

Note that it does not make sense to assume KF_TRUSTED_ARGS for all
KF_ACQUIRE kfuncs. KF_ACQUIRE kfuncs can have looser semantics than
KF_RELEASE, with e.g. KF_RCU | KF_RET_NULL. We may want to have
KF_ACQUIRE imply KF_TRUSTED_ARGS _unless_ KF_RCU is specified, but that
can be left to another patch set, and there are no such subtleties to
address for KF_RELEASE.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230325213144.486885-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

David Vernet [Sat, 25 Mar 2023 21:31:45 +0000 (16:31 -0500)]

bpf: Remove now-unnecessary NULL checks for KF_RELEASE kfuncs

Now that we're not invoking kfunc destructors when the kptr in a map was
NULL, we no longer require NULL checks in many of our KF_RELEASE kfuncs.
This patch removes those NULL checks.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230325213144.486885-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

David Vernet [Sat, 25 Mar 2023 21:31:42 +0000 (16:31 -0500)]

bpf: Only invoke kptr dtor following non-NULL xchg

When a map value is being freed, we loop over all of the fields of the
corresponding BPF object and issue the appropriate cleanup calls
corresponding to the field's type. If the field is a referenced kptr, we
atomically xchg the value out of the map, and invoke the kptr's
destructor on whatever was there before (or bpf_obj_drop() it if it was
a local kptr).

Currently, we always invoke the destructor (either bpf_obj_drop() or the
kptr's registered destructor) on any KPTR_REF-type field in a map, even
if there wasn't a value in the map. This means that any function serving
as the kptr's KF_RELEASE destructor must always treat the argument as
possibly NULL, as the following can and regularly does happen:

void *xchgd_field;

/* No value was in the map, so xchgd_field is NULL */
xchgd_field = (void *)xchg(unsigned long *field_ptr, 0);
field->kptr.dtor(xchgd_field);

These are odd semantics to impose on KF_RELEASE kfuncs -- BPF programs
are prohibited by the verifier from passing NULL pointers to KF_RELEASE
kfuncs, so it doesn't make sense to require this of BPF programs, but
not the main kernel destructor path. It's also unnecessary to invoke any
cleanup logic for local kptrs. If there is no object there, there's
nothing to drop.

So as to allow KF_RELEASE kfuncs to fully assume that an argument is
non-NULL, this patch updates a KPTR_REF's destructor to only be invoked
when a non-NULL value is xchg'd out of the kptr map field.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230325213144.486885-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Fri, 24 Mar 2023 18:42:41 +0000 (11:42 -0700)]

bpf: Check IS_ERR for the bpf_map_get() return value

This patch fixes a mistake in checking NULL instead of
checking IS_ERR for the bpf_map_get() return value.

It also fixes the return value in link_update_map() from -EINVAL
to PTR_ERR(*_map).

Reported-by: syzbot+71ccc0fe37abb458406b@syzkaller.appspotmail.com
Fixes: 68b04864ca42 ("bpf: Create links for BPF struct_ops maps.")
Fixes: aef56f2e918b ("bpf: Update the struct_ops of a bpf_link.")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230324184241.1387437-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Thu, 23 Mar 2023 05:49:40 +0000 (22:49 -0700)]

Merge branch 'Transit between BPF TCP congestion controls.'

Kui-Feng Lee says:

====================

Major changes:

- Create bpf_links in the kernel for BPF struct_ops to register and
   unregister it.

- Enables switching between implementations of bpf-tcp-cc under a
   name instantly by replacing the backing struct_ops map of a
   bpf_link.

Previously, BPF struct_ops didn't go off, as even when the user
program creating it was terminated, none of these ever were pinned.
For instance, the TCP congestion control subsystem indirectly
maintains a reference count on the struct_ops of any registered BPF
implemented algorithm. Thus, the algorithm won't be deactivated until
someone deliberately unregisters it.  For compatibility with other BPF
programs, bpf_links have been created to work in coordination with
struct_ops maps. This ensures that the registration and unregistration
of these respective maps is carried out at the start and end of the
bpf_link.

We also faced complications when attempting to replace an existing TCP
congestion control algorithm with a new implementation on the fly. A
struct_ops map was used to register a TCP congestion control algorithm
with a unique name.  We had to either register the alternative
implementation with a new name and move over or unregister the current
one before being able to reregistration with the same name.  To fix
this problem, we can an option to migrate the registration of the
algorithm from struct_ops maps to bpf_links. By modifying the backing
map of a bpf_link, it suddenly becomes possible to replace an existing
TCP congestion control algorithm with ease.
---

The major differences from v11:

- Fix incorrectly setting both old_prog_fd and old_map_fd.

The major differences from v10:

- Add old_map_fd as an additional field instead of an union in
   bpf_link_update_opts.

The major differences from v9:

- Add test case for BPF_F_LINK.  Includes adding old_map_fd to struct
   bpf_link_update_opts in patch 6.

- Return -EPERM instead of -EINVAL when the old map fd doesn't match
   with BPF_F_LINK.

- Fix -EBUSY case in bpf_map__attach_struct_ops().

The major differences form v8:

- Check bpf_struct_ops::{validate,update} in
   bpf_struct_ops_map_alloc()

The major differences from v7:

- Use synchronize_rcu_mult(call_rcu, call_rcu_tasks) to replace
   synchronize_rcu() and synchronize_rcu_tasks().

- Call synchronize_rcu() in tcp_update_congestion_control().

- Handle -EBUSY in bpf_map__attach_struct_ops() to allow a struct_ops
   can be used to create links more than once.  Include a test case.

- Add old_map_fd to bpf_attr and handle BPF_F_REPLACE in
   bpf_struct_ops_map_link_update().

- Remove changes in bpf_dummy_struct_ops.c and add a check of .update
   function pointer of bpf_struct_ops.

The major differences from v6:

- Reword commit logs of the patch 1, 2, and 8.

- Call synchronize_rcu_tasks() as well in bpf_struct_ops_map_free().

- Refactor bpf_struct_ops_map_free() so that
   bpf_struct_ops_map_alloc() can free a struct_ops without waiting
   for a RCU grace period.

The major differences from v5:

- Add a new step to bpf_object__load() to prepare vdata.

- Accept BPF_F_REPLACE.

- Check section IDs in find_struct_ops_map_by_offset()

- Add a test case to check mixing w/ and w/o link struct_ops.

- Add a test case of using struct_ops w/o link to update a link.

- Improve bpf_link__detach_struct_ops() to handle the w/ link case.

The major differences from v4:

- Rebase.

- Reorder patches and merge part 4 to part 2 of the v4.

The major differences from v3:

- Remove bpf_struct_ops_map_free_rcu(), and use synchronize_rcu().

- Improve the commit log of the part 1.

- Before transitioning to the READY state, we conduct a value check
   to ensure that struct_ops can be successfully utilized and links
   created later.

The major differences from v2:

- Simplify states

   - Remove TOBEUNREG.

   - Rename UNREG to READY.

- Stop using the refcnt of the kvalue of a struct_ops. Explicitly
   increase and decrease the refcount of struct_ops.

- Prepare kernel vdata during the load phase of libbpf.

The major differences from v1:

- Added bpf_struct_ops_link to replace the previous union-based
   approach.

- Added UNREG and TOBEUNREG to the state of bpf_struct_ops_map.

   - bpf_struct_ops_transit_state() maintains state transitions.

- Fixed synchronization issue.

- Prepare kernel vdata of struct_ops during the loading phase of
   bpf_object.

- Merged previous patch 3 to patch 1.

v11: https://lore.kernel.org/all/20230323010409.2265383-1-kuifeng@meta.com/
v10: https://lore.kernel.org/all/20230321232813.3376064-1-kuifeng@meta.com/
v9: https://lore.kernel.org/all/20230320195644.1953096-1-kuifeng@meta.com/
v8: https://lore.kernel.org/all/20230318053144.1180301-1-kuifeng@meta.com/
v7: https://lore.kernel.org/all/20230316023641.2092778-1-kuifeng@meta.com/
v6: https://lore.kernel.org/all/20230310043812.3087672-1-kuifeng@meta.com/
v5: https://lore.kernel.org/all/20230308005050.255859-1-kuifeng@meta.com/
v4: https://lore.kernel.org/all/20230307232913.576893-1-andrii@kernel.org/
v3: https://lore.kernel.org/all/20230303012122.852654-1-kuifeng@meta.com/
v2: https://lore.kernel.org/bpf/20230223011238.12313-1-kuifeng@meta.com/
v1: https://lore.kernel.org/bpf/20230214221718.503964-1-kuifeng@meta.com/
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Kui-Feng Lee [Thu, 23 Mar 2023 03:24:05 +0000 (20:24 -0700)]

selftests/bpf: Test switching TCP Congestion Control algorithms.

Create a pair of sockets that utilize the congestion control algorithm
under a particular name. Then switch up this congestion control
algorithm to another implementation and check whether newly created
connections using the same cc name now run the new implementation.

Also, try to update a link with a struct_ops that is without
BPF_F_LINK or with a wrong or different name.  These cases should fail
due to the violation of assumptions.  To update a bpf_link of a
struct_ops, it must be replaced with another struct_ops that is
identical in type and name and has the BPF_F_LINK flag.

The other test case is to create links from the same struct_ops more
than once.  It makes sure a struct_ops can be used repeatly.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-9-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Kui-Feng Lee [Thu, 23 Mar 2023 03:24:04 +0000 (20:24 -0700)]

libbpf: Use .struct_ops.link section to indicate a struct_ops with a link.

Flags a struct_ops is to back a bpf_link by putting it to the
".struct_ops.link" section. Once it is flagged, the created
struct_ops can be used to create a bpf_link or update a bpf_link that
has been backed by another struct_ops.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230323032405.3735486-8-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Kui-Feng Lee [Thu, 23 Mar 2023 03:24:03 +0000 (20:24 -0700)]

libbpf: Update a bpf_link with another struct_ops.

Introduce bpf_link__update_map(), which allows to atomically update
underlying struct_ops implementation for given struct_ops BPF link.

Also add old_map_fd to struct bpf_link_update_opts to handle
BPF_F_REPLACE feature.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-7-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Kui-Feng Lee [Thu, 23 Mar 2023 03:24:02 +0000 (20:24 -0700)]

bpf: Update the struct_ops of a bpf_link.

By improving the BPF_LINK_UPDATE command of bpf(), it should allow you
to conveniently switch between different struct_ops on a single
bpf_link. This would enable smoother transitions from one struct_ops
to another.

The struct_ops maps passing along with BPF_LINK_UPDATE should have the
BPF_F_LINK flag.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230323032405.3735486-6-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Kui-Feng Lee [Thu, 23 Mar 2023 03:24:01 +0000 (20:24 -0700)]

libbpf: Create a bpf_link in bpf_map__attach_struct_ops().

bpf_map__attach_struct_ops() was creating a dummy bpf_link as a
placeholder, but now it is constructing an authentic one by calling
bpf_link_create() if the map has the BPF_F_LINK flag.

You can flag a struct_ops map with BPF_F_LINK by calling
bpf_map__set_map_flags().

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230323032405.3735486-5-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Kui-Feng Lee [Thu, 23 Mar 2023 03:24:00 +0000 (20:24 -0700)]

bpf: Create links for BPF struct_ops maps.

Make bpf_link support struct_ops. Previously, struct_ops were always
used alone without any associated links. Upon updating its value, a
struct_ops would be activated automatically. Yet other BPF program
types required to make a bpf_link with their instances before they
could become active. Now, however, you can create an inactive
struct_ops, and create a link to activate it later.

With bpf_links, struct_ops has a behavior similar to other BPF program
types. You can pin/unpin them from their links and the struct_ops will
be deactivated when its link is removed while previously need someone
to delete the value for it to be deactivated.

bpf_links are responsible for registering their associated
struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag
set to create a bpf_link, while a structs without this flag behaves in
the same manner as before and is registered upon updating its value.

The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it
used to craft the links for BPF struct_ops programs, but also to
create links for BPF struct_ops them-self. Since the links of BPF
struct_ops programs are only used to create trampolines internally,
they are never seen in other contexts. Thus, they can be reused for
struct_ops themself.

To maintain a reference to the map supporting this link, we add
bpf_struct_ops_link as an additional type. The pointer of the map is
RCU and won't be necessary until later in the patchset.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-4-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Kui-Feng Lee [Thu, 23 Mar 2023 03:23:59 +0000 (20:23 -0700)]

net: Update an existing TCP congestion control algorithm.

This feature lets you immediately transition to another congestion
control algorithm or implementation with the same name.  Once a name
is updated, new connections will apply this new algorithm.

The purpose is to update a customized algorithm implemented in BPF
struct_ops with a new version on the flight.  The following is an
example of using the userspace API implemented in later BPF patches.

   link = bpf_map__attach_struct_ops(skel->maps.ca_update_1);
   .......
   err = bpf_link__update_map(link, skel->maps.ca_update_2);

We first load and register an algorithm implemented in BPF struct_ops,
then swap it out with a new one using the same name. After that, newly
created connections will apply the updated algorithm, while older ones
retain the previous version already applied.

This patch also takes this chance to refactor the ca validation into
the new tcp_validate_congestion_control() function.

Cc: netdev@vger.kernel.org, Eric Dumazet <edumazet@google.com>
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-3-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Kui-Feng Lee [Thu, 23 Mar 2023 03:23:58 +0000 (20:23 -0700)]

bpf: Retire the struct_ops map kvalue->refcnt.

We have replaced kvalue-refcnt with synchronize_rcu() to wait for an
RCU grace period.

Maintenance of kvalue->refcnt was a complicated task, as we had to
simultaneously keep track of two reference counts: one for the
reference count of bpf_map. When the kvalue->refcnt reaches zero, we
also have to reduce the reference count on bpf_map - yet these steps
are not performed in an atomic manner and require us to be vigilant
when managing them. By eliminating kvalue->refcnt, we can make our
maintenance more straightforward as the refcount of bpf_map is now
solely managed!

To prevent the trampoline image of a struct_ops from being released
while it is still in use, we wait for an RCU grace period. The
setsockopt(TCP_CONGESTION, "...") command allows you to change your
socket's congestion control algorithm and can result in releasing the
old struct_ops implementation. It is fine. However, this function is
exposed through bpf_setsockopt(), it may be accessed by BPF programs
as well. To ensure that the trampoline image belonging to struct_op
can be safely called while its method is in use, the trampoline
safeguarde the BPF program with rcu_read_lock(). Doing so prevents any
destruction of the associated images before returning from a
trampoline and requires us to wait for an RCU grace period.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-2-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Wed, 22 Mar 2023 23:25:02 +0000 (16:25 -0700)]

bpf: remember meta->iter info only for initialized iters

For iter_new() functions iterator state's slot might not be yet
initialized, in which case iter_get_spi() will return -ERANGE. This is
expected and is handled properly. But for iter_next() and iter_destroy()
cases iter slot is supposed to be initialized and correct, so -ERANGE is
not possible.

Move meta->iter.{spi,frameno} initialization into iter_next/iter_destroy
handling branch to make it more explicit that valid information will be
remembered in meta->iter block for subsequent use in process_iter_next_call(),
avoiding confusingly looking -ERANGE assignment for meta->iter.spi.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230322232502.836171-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Xu Kuohai [Wed, 22 Mar 2023 21:30:56 +0000 (22:30 +0100)]

selftests/bpf: Check when bounds are not in the 32-bit range

Add cases to check if bound is updated correctly when 64-bit value is
not in the 32-bit range.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230322213056.2470-2-daniel@iogearbox.net

commit | commitdiff | tree

Daniel Borkmann [Wed, 22 Mar 2023 21:30:55 +0000 (22:30 +0100)]

bpf: Fix __reg_bound_offset 64->32 var_off subreg propagation

Xu reports that after commit 3f50f132d840 ("bpf: Verifier, do explicit ALU32
bounds tracking"), the following BPF program is rejected by the verifier:

   0: (61) r2 = *(u32 *)(r1 +0)          ; R2_w=pkt(off=0,r=0,imm=0)
   1: (61) r3 = *(u32 *)(r1 +4)          ; R3_w=pkt_end(off=0,imm=0)
   2: (bf) r1 = r2
   3: (07) r1 += 1
   4: (2d) if r1 > r3 goto pc+8
   5: (71) r1 = *(u8 *)(r2 +0)           ; R1_w=scalar(umax=255,var_off=(0x0; 0xff))
   6: (18) r0 = 0x7fffffffffffff10
   8: (0f) r1 += r0                      ; R1_w=scalar(umin=0x7fffffffffffff10,umax=0x800000000000000f)
   9: (18) r0 = 0x8000000000000000
  11: (07) r0 += 1
  12: (ad) if r0 < r1 goto pc-2
  13: (b7) r0 = 0
  14: (95) exit

And the verifier log says:

  func#0 @0
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (61) r2 = *(u32 *)(r1 +0)          ; R1=ctx(off=0,imm=0) R2_w=pkt(off=0,r=0,imm=0)
  1: (61) r3 = *(u32 *)(r1 +4)          ; R1=ctx(off=0,imm=0) R3_w=pkt_end(off=0,imm=0)
  2: (bf) r1 = r2                       ; R1_w=pkt(off=0,r=0,imm=0) R2_w=pkt(off=0,r=0,imm=0)
  3: (07) r1 += 1                       ; R1_w=pkt(off=1,r=0,imm=0)
  4: (2d) if r1 > r3 goto pc+8          ; R1_w=pkt(off=1,r=1,imm=0) R3_w=pkt_end(off=0,imm=0)
  5: (71) r1 = *(u8 *)(r2 +0)           ; R1_w=scalar(umax=255,var_off=(0x0; 0xff)) R2_w=pkt(off=0,r=1,imm=0)
  6: (18) r0 = 0x7fffffffffffff10       ; R0_w=9223372036854775568
  8: (0f) r1 += r0                      ; R0_w=9223372036854775568 R1_w=scalar(umin=9223372036854775568,umax=9223372036854775823,s32_min=-240,s32_max=15)
  9: (18) r0 = 0x8000000000000000       ; R0_w=-9223372036854775808
  11: (07) r0 += 1                      ; R0_w=-9223372036854775807
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775807 R1_w=scalar(umin=9223372036854775568,umax=9223372036854775809)
  13: (b7) r0 = 0                       ; R0_w=0
  14: (95) exit

  from 12 to 11: R0_w=-9223372036854775807 R1_w=scalar(umin=9223372036854775810,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff)) R2_w=pkt(off=0,r=1,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775806
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775806 R1_w=scalar(umin=9223372036854775810,umax=9223372036854775810,var_off=(0x8000000000000000; 0xffffffff))
  13: safe

  [...]

  from 12 to 11: R0_w=-9223372036854775795 R1=scalar(umin=9223372036854775822,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff)) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775794
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775794 R1=scalar(umin=9223372036854775822,umax=9223372036854775822,var_off=(0x8000000000000000; 0xffffffff))
  13: safe

  from 12 to 11: R0_w=-9223372036854775794 R1=scalar(umin=9223372036854775823,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff)) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775793
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775793 R1=scalar(umin=9223372036854775823,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff))
  13: safe

  from 12 to 11: R0_w=-9223372036854775793 R1=scalar(umin=9223372036854775824,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff)) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775792
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775792 R1=scalar(umin=9223372036854775824,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff))
  13: safe

  [...]

The 64bit umin=9223372036854775810 bound continuously bumps by +1 while
umax=9223372036854775823 stays as-is until the verifier complexity limit
is reached and the program gets finally rejected. During this simulation,
the umin also eventually surpasses umax. Looking at the first 'from 12
to 11' output line from the loop, R1 has the following state:

  R1_w=scalar(umin=0x8000000000000002 (9223372036854775810),
              umax=0x800000000000000f (9223372036854775823),
          var_off=(0x8000000000000000;
                           0xffffffff))

The var_off has technically not an inconsistent state but it's very
imprecise and far off surpassing 64bit umax bounds whereas the expected
output with refined known bits in var_off should have been like:

  R1_w=scalar(umin=0x8000000000000002 (9223372036854775810),
              umax=0x800000000000000f (9223372036854775823),
          var_off=(0x8000000000000000;
                                  0xf))

In the above log, var_off stays as var_off=(0x8000000000000000; 0xffffffff)
and does not converge into a narrower mask where more bits become known,
eventually transforming R1 into a constant upon umin=9223372036854775823,
umax=9223372036854775823 case where the verifier would have terminated and
let the program pass.

The __reg_combine_64_into_32() marks the subregister unknown and propagates
64bit {s,u}min/{s,u}max bounds to their 32bit equivalents iff they are within
the 32bit universe. The question came up whether __reg_combine_64_into_32()
should special case the situation that when 64bit {s,u}min bounds have
the same value as 64bit {s,u}max bounds to then assign the latter as
well to the 32bit reg->{s,u}32_{min,max}_value. As can be seen from the
above example however, that is just /one/ special case and not a /generic/
solution given above example would still not be addressed this way and
remain at an imprecise var_off=(0x8000000000000000; 0xffffffff).

The improvement is needed in __reg_bound_offset() to refine var32_off with
the updated var64_off instead of the prior reg->var_off. The reg_bounds_sync()
code first refines information about the register's min/max bounds via
__update_reg_bounds() from the current var_off, then in __reg_deduce_bounds()
from sign bit and with the potentially learned bits from bounds it'll
update the var_off tnum in __reg_bound_offset(). For example, intersecting
with the old var_off might have improved bounds slightly, e.g. if umax
was 0x7f...f and var_off was (0; 0xf...fc), then new var_off will then
result in (0; 0x7f...fc). The intersected var64_off holds then the
universe which is a superset of var32_off. The point for the latter is
not to broaden, but to further refine known bits based on the intersection
of var_off with 32 bit bounds, so that we later construct the final var_off
from upper and lower 32 bits. The final __update_reg_bounds() can then
potentially still slightly refine bounds if more bits became known from the
new var_off.

After the improvement, we can see R1 converging successively:

  func#0 @0
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (61) r2 = *(u32 *)(r1 +0)          ; R1=ctx(off=0,imm=0) R2_w=pkt(off=0,r=0,imm=0)
  1: (61) r3 = *(u32 *)(r1 +4)          ; R1=ctx(off=0,imm=0) R3_w=pkt_end(off=0,imm=0)
  2: (bf) r1 = r2                       ; R1_w=pkt(off=0,r=0,imm=0) R2_w=pkt(off=0,r=0,imm=0)
  3: (07) r1 += 1                       ; R1_w=pkt(off=1,r=0,imm=0)
  4: (2d) if r1 > r3 goto pc+8          ; R1_w=pkt(off=1,r=1,imm=0) R3_w=pkt_end(off=0,imm=0)
  5: (71) r1 = *(u8 *)(r2 +0)           ; R1_w=scalar(umax=255,var_off=(0x0; 0xff)) R2_w=pkt(off=0,r=1,imm=0)
  6: (18) r0 = 0x7fffffffffffff10       ; R0_w=9223372036854775568
  8: (0f) r1 += r0                      ; R0_w=9223372036854775568 R1_w=scalar(umin=9223372036854775568,umax=9223372036854775823,s32_min=-240,s32_max=15)
  9: (18) r0 = 0x8000000000000000       ; R0_w=-9223372036854775808
  11: (07) r0 += 1                      ; R0_w=-9223372036854775807
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775807 R1_w=scalar(umin=9223372036854775568,umax=9223372036854775809)
  13: (b7) r0 = 0                       ; R0_w=0
  14: (95) exit

  from 12 to 11: R0_w=-9223372036854775807 R1_w=scalar(umin=9223372036854775810,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2_w=pkt(off=0,r=1,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775806
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775806 R1_w=-9223372036854775806
  13: safe

  from 12 to 11: R0_w=-9223372036854775806 R1_w=scalar(umin=9223372036854775811,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2_w=pkt(off=0,r=1,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775805
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775805 R1_w=-9223372036854775805
  13: safe

  [...]

  from 12 to 11: R0_w=-9223372036854775798 R1=scalar(umin=9223372036854775819,umax=9223372036854775823,var_off=(0x8000000000000008; 0x7),s32_min=8,s32_max=15,u32_min=8,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775797
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775797 R1=-9223372036854775797
  13: safe

  from 12 to 11: R0_w=-9223372036854775797 R1=scalar(umin=9223372036854775820,umax=9223372036854775823,var_off=(0x800000000000000c; 0x3),s32_min=12,s32_max=15,u32_min=12,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775796
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775796 R1=-9223372036854775796
  13: safe

  from 12 to 11: R0_w=-9223372036854775796 R1=scalar(umin=9223372036854775821,umax=9223372036854775823,var_off=(0x800000000000000c; 0x3),s32_min=12,s32_max=15,u32_min=12,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775795
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775795 R1=-9223372036854775795
  13: safe

  from 12 to 11: R0_w=-9223372036854775795 R1=scalar(umin=9223372036854775822,umax=9223372036854775823,var_off=(0x800000000000000e; 0x1),s32_min=14,s32_max=15,u32_min=14,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775794
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775794 R1=-9223372036854775794
  13: safe

  from 12 to 11: R0_w=-9223372036854775794 R1=-9223372036854775793 R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775793
  12: (ad) if r0 < r1 goto pc-2
  last_idx 12 first_idx 12
  parent didn't have regs=1 stack=0 marks: R0_rw=P-9223372036854775801 R1_r=scalar(umin=9223372036854775815,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  last_idx 11 first_idx 11
  regs=1 stack=0 before 11: (07) r0 += 1
  parent didn't have regs=1 stack=0 marks: R0_rw=P-9223372036854775805 R1_rw=scalar(umin=9223372036854775812,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2_w=pkt(off=0,r=1,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
  last_idx 12 first_idx 0
  regs=1 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=1 stack=0 before 11: (07) r0 += 1
  regs=1 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=1 stack=0 before 11: (07) r0 += 1
  regs=1 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=1 stack=0 before 11: (07) r0 += 1
  regs=1 stack=0 before 9: (18) r0 = 0x8000000000000000
  last_idx 12 first_idx 12
  parent didn't have regs=2 stack=0 marks: R0_rw=P-9223372036854775801 R1_r=Pscalar(umin=9223372036854775815,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  last_idx 11 first_idx 11
  regs=2 stack=0 before 11: (07) r0 += 1
  parent didn't have regs=2 stack=0 marks: R0_rw=P-9223372036854775805 R1_rw=Pscalar(umin=9223372036854775812,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2_w=pkt(off=0,r=1,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
  last_idx 12 first_idx 0
  regs=2 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=2 stack=0 before 11: (07) r0 += 1
  regs=2 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=2 stack=0 before 11: (07) r0 += 1
  regs=2 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=2 stack=0 before 11: (07) r0 += 1
  regs=2 stack=0 before 9: (18) r0 = 0x8000000000000000
  regs=2 stack=0 before 8: (0f) r1 += r0
  regs=3 stack=0 before 6: (18) r0 = 0x7fffffffffffff10
  regs=2 stack=0 before 5: (71) r1 = *(u8 *)(r2 +0)
  13: safe

  from 4 to 13: safe
  verification time 322 usec
  stack depth 0
  processed 56 insns (limit 1000000) max_states_per_insn 1 total_states 3 peak_states 3 mark_read 1

This also fixes up a test case along with this improvement where we match
on the verifier log. The updated log now has a refined var_off, too.

Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Xu Kuohai <xukuohai@huaweicloud.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230314203424.4015351-2-xukuohai@huaweicloud.com
Link: https://lore.kernel.org/bpf/20230322213056.2470-1-daniel@iogearbox.net

commit | commitdiff | tree

Alexei Starovoitov [Wed, 22 Mar 2023 22:11:07 +0000 (15:11 -0700)]

Merge branch 'error checking where helpers call bpf_map_ops'

JP Kobryn says:

====================

Within bpf programs, the bpf helper functions can make inline calls to
kernel functions. In this scenario there can be a disconnect between the
register the kernel function writes a return value to and the register the
bpf program uses to evaluate that return value.

As an example, this bpf code:

long err = bpf_map_update_elem(...);
if (err && err != -EEXIST)
// got some error other than -EEXIST

...can result in the bpf assembly:

; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST);
  37: movabs $0xffff976a10730400,%rdi
  41: mov    $0x1,%ecx
  46: call   0xffffffffe103291c ; htab_map_update_elem
; if (err && err != -EEXIST) {
  4b: cmp    $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax
  4f: je     0x000000000000008e
  51: test   %rax,%rax
  54: je     0x000000000000008e

The compare operation here evaluates %rax, while in the preceding call to
htab_map_update_elem the corresponding assembly returns -EEXIST via %eax
(the lower 32 bits of %rax):

movl $0xffffffef, %r9d
...
movl %r9d, %eax

...since it's returning int (32-bit). So the resulting comparison becomes:

cmp $0xffffffffffffffef, $0x00000000ffffffef

...making it not possible to check for negative errors or specific errors,
since the sign value is left at the 32nd bit. It means in the original
example, the conditional branch will be entered even when the error is
-EEXIST, which was not intended.

The selftests added cover these cases for the different bpf_map_ops
functions. When the second patch is applied, changing the return type of
those functions to long, the comparison works as intended and the tests
pass.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

JP Kobryn [Wed, 22 Mar 2023 19:47:54 +0000 (12:47 -0700)]

bpf: return long from bpf_map_ops funcs

This patch changes the return types of bpf_map_ops functions to long, where
previously int was returned. Using long allows for bpf programs to maintain
the sign bit in the absence of sign extension during situations where
inlined bpf helper funcs make calls to the bpf_map_ops funcs and a negative
error is returned.

The definitions of the helper funcs are generated from comments in the bpf
uapi header at `include/uapi/linux/bpf.h`. The return type of these
helpers was previously changed from int to long in commit bdb7b79b4ce8. For
any case where one of the map helpers call the bpf_map_ops funcs that are
still returning 32-bit int, a compiler might not include sign extension
instructions to properly convert the 32-bit negative value a 64-bit
negative value.

For example:
bpf assembly excerpt of an inlined helper calling a kernel function and
checking for a specific error:

; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST);
  ...
  46: call   0xffffffffe103291c ; htab_map_update_elem
; if (err && err != -EEXIST) {
  4b: cmp    $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax

kernel function assembly excerpt of return value from
`htab_map_update_elem` returning 32-bit int:

movl $0xffffffef, %r9d
...
movl %r9d, %eax

...results in the comparison:
cmp $0xffffffffffffffef, $0x00000000ffffffef

Fixes: bdb7b79b4ce8 ("bpf: Switch most helper return values from 32-bit int to 64-bit long")
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Link: https://lore.kernel.org/r/20230322194754.185781-3-inwardvessel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

JP Kobryn [Wed, 22 Mar 2023 19:47:53 +0000 (12:47 -0700)]

bpf/selftests: coverage for bpf_map_ops errors

These tests expose the issue of being unable to properly check for errors
returned from inlined bpf map helpers that make calls to the bpf_map_ops
functions. At best, a check for zero or non-zero can be done but these
tests show it is not possible to check for a negative value or for a
specific error value.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230322194754.185781-2-inwardvessel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Wed, 22 Mar 2023 16:31:05 +0000 (09:31 -0700)]

Merge branch 'bpf: Support ksym detection in light skeleton.'

Alexei Starovoitov says:

====================

From: Alexei Starovoitov <ast@kernel.org>

v1->v2: update denylist on s390

Patch 1: Cleanup internal libbpf names.
Patch 2: Teach the verifier that rdonly_mem != NULL.
Patch 3: Fix gen_loader to support ksym detection.
Patch 4: Selftest and update denylist.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Tue, 21 Mar 2023 20:38:54 +0000 (13:38 -0700)]

selftests/bpf: Add light skeleton test for kfunc detection.

Add light skeleton test for kfunc detection and denylist it for s390.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230321203854.3035-5-alexei.starovoitov@gmail.com

commit | commitdiff | tree

Alexei Starovoitov [Tue, 21 Mar 2023 20:38:53 +0000 (13:38 -0700)]

libbpf: Support kfunc detection in light skeleton.

Teach gen_loader to find {btf_id, btf_obj_fd} of kernel variables and kfuncs
and populate corresponding ld_imm64 and bpf_call insns.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230321203854.3035-4-alexei.starovoitov@gmail.com

commit | commitdiff | tree

Alexei Starovoitov [Tue, 21 Mar 2023 20:38:52 +0000 (13:38 -0700)]

bpf: Teach the verifier to recognize rdonly_mem as not null.

Teach the verifier to recognize PTR_TO_MEM | MEM_RDONLY as not NULL
otherwise if (!bpf_ksym_exists(known_kfunc)) doesn't go through
dead code elimination.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230321203854.3035-3-alexei.starovoitov@gmail.com

commit | commitdiff | tree

Alexei Starovoitov [Tue, 21 Mar 2023 20:38:51 +0000 (13:38 -0700)]

libbpf: Rename RELO_EXTERN_VAR/FUNC.

RELO_EXTERN_VAR/FUNC names are not correct anymore. RELO_EXTERN_VAR represent
ksym symbol in ld_imm64 insn. It can point to kernel variable or kfunc.
Rename RELO_EXTERN_VAR->RELO_EXTERN_LD64 and RELO_EXTERN_FUNC->RELO_EXTERN_CALL
to match what they actually represent.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230321203854.3035-2-alexei.starovoitov@gmail.com

commit | commitdiff | tree

Tushar Vyavahare [Mon, 20 Mar 2023 10:27:05 +0000 (15:57 +0530)]

selftests/xsk: add xdp populate metadata test

Add a new test in copy-mode for testing the copying of metadata from the
buffer in kernel-space to user-space. This is accomplished by adding a
new XDP program and using the bss map to store a counter that is written
to the metadata field. This counter is incremented for every packet so
that the number becomes unique and should be the same as the payload. It
is store in the bss so the value can be reset between runs.

The XDP program populates the metadata and the userspace program checks
the value stored in the metadata field against the payload using the new
is_metadata_correct() function. To turn this verification on or off, add
a new parameter (use_metadata) to the ifobject structure.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230320102705.306187-1-tushar.vyavahare@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Tue, 21 Mar 2023 04:57:50 +0000 (21:57 -0700)]

Merge branch 'net: skbuff: skb bitfield compaction - bpf'

Jakub Kicinski says:

====================

I'm trying to make more of the sk_buff bits optional.
Move the BPF-accessed bits a little - because they must
be at coding-time-constant offsets they must precede any
optional bit. While at it clean up the naming a bit.

v1: https://lore.kernel.org/all/20230308003159.441580-1-kuba@kernel.org/
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 21 Mar 2023 01:41:15 +0000 (18:41 -0700)]

net: skbuff: move the fields BPF cares about directly next to the offset marker

To avoid more possible BPF dependencies with moving bitfields
around keep the fields BPF cares about right next to the offset
marker.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230321014115.997841-4-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 21 Mar 2023 01:41:14 +0000 (18:41 -0700)]

net: skbuff: reorder bytes 2 and 3 of the bitfield

BPF needs to know the offsets of fields it tries to access.
Zero-length fields are added to make offsetof() work.
This unfortunately partitions the bitfield (fields across
the zero-length members can't be coalesced).

Reorder bytes 2 and 3, BPF needs to know the offset of fields
previously in byte 3 and some fields in byte 2 should really
be optional.

The two bytes are always in the same cacheline so it should
not matter.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230321014115.997841-3-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 21 Mar 2023 01:41:13 +0000 (18:41 -0700)]

net: skbuff: rename __pkt_vlan_present_offset to __mono_tc_offset

vlan_present is gone since
commit 354259fa73e2 ("net: remove skb->vlan_present")
rename the offset field to what BPF is currently looking
for in this byte - mono_delivery_time and tc_at_ingress.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230321014115.997841-2-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Liu Pan [Mon, 20 Mar 2023 03:07:20 +0000 (11:07 +0800)]

libbpf: Explicitly call write to append content to file

Write data to fd by calling "vdprintf", in most implementations
of the standard library, the data is finally written by the writev syscall.
But "uprobe_events/kprobe_events" does not allow segmented writes,
so switch the "append_to_file" function to explicit write() call.

Signed-off-by: Liu Pan <patteliu@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230320030720.650-1-patteliu@gmail.com

commit | commitdiff | tree

Alexei Starovoitov [Sun, 19 Mar 2023 20:30:14 +0000 (13:30 -0700)]

selftest/bpf: Add a test case for ld_imm64 copy logic.

Add a test case to exercise {btf_id, btf_obj_fd} copy logic between ld_imm64 insns.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230319203014.55866-2-alexei.starovoitov@gmail.com

commit | commitdiff | tree

Alexei Starovoitov [Sun, 19 Mar 2023 20:30:13 +0000 (13:30 -0700)]

libbpf: Fix ld_imm64 copy logic for ksym in light skeleton.

Unlike normal libbpf the light skeleton 'loader' program is doing
btf_find_by_name_kind() call at run-time to find ksym in the kernel and
populate its {btf_id, btf_obj_fd} pair in ld_imm64 insn. To avoid doing the
search multiple times for the same ksym it remembers the first patched ld_imm64
insn and copies {btf_id, btf_obj_fd} from it into subsequent ld_imm64 insn.
Fix a bug in copying logic, since it may incorrectly clear BPF_PSEUDO_BTF_ID flag.

Also replace always true if (btf_obj_fd >= 0) check with unconditional JMP_JA
to clarify the code.

Fixes: d995816b77eb ("libbpf: Avoid reload of imm for weak, unresolved, repeating ksym")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230319203014.55866-1-alexei.starovoitov@gmail.com

commit | commitdiff | tree

Sreevani Sreejith [Wed, 15 Mar 2023 19:54:05 +0000 (12:54 -0700)]

bpf, docs: Libbpf overview documentation

This patch documents overview of libbpf, including its features for
developing BPF programs.

Signed-off-by: Sreevani Sreejith <ssreevani@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230315195405.2051559-1-ssreevani@meta.com

commit | commitdiff | tree

Manu Bretelle [Fri, 17 Mar 2023 16:32:56 +0000 (09:32 -0700)]

selftests/bpf: Add --json-summary option to test_progs

Currently, test_progs outputs all stdout/stderr as it runs, and when it
is done, prints a summary.

It is non-trivial for tooling to parse that output and extract meaningful
information from it.

This change adds a new option, `--json-summary`/`-J` that let the caller
specify a file where `test_progs{,-no_alu32}` can write a summary of the
run in a json format that can later be parsed by tooling.

Currently, it creates a summary section with successes/skipped/failures
followed by a list of failed tests and subtests.

A test contains the following fields:
- name: the name of the test
- number: the number of the test
- message: the log message that was printed by the test.
- failed: A boolean indicating whether the test failed or not. Currently
we only output failed tests, but in the future, successful tests could
be added.
- subtests: A list of subtests associated with this test.

A subtest contains the following fields:
- name: same as above
- number: sanme as above
- message: the log message that was printed by the subtest.
- failed: same as above but for the subtest

An example run and json content below:
```
$ sudo ./test_progs -a $(grep -v '^#' ./DENYLIST.aarch64 | awk '{print
$1","}' | tr -d '\n') -j -J /tmp/test_progs.json
$ jq < /tmp/test_progs.json | head -n 30
{
  "success": 29,
  "success_subtest": 23,
  "skipped": 3,
  "failed": 28,
  "results": [
    {
      "name": "bpf_cookie",
      "number": 10,
      "message": "test_bpf_cookie:PASS:skel_open 0 nsec\n",
      "failed": true,
      "subtests": [
        {
          "name": "multi_kprobe_link_api",
          "number": 2,
          "message": "kprobe_multi_link_api_subtest:PASS:load_kallsyms 0 nsec\nlibbpf: extern 'bpf_testmod_fentry_test1' (strong): not resolved\nlibbpf: failed to load object 'kprobe_multi'\nlibbpf: failed to load BPF skeleton 'kprobe_multi': -3\nkprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3\n",
          "failed": true
        },
        {
          "name": "multi_kprobe_attach_api",
          "number": 3,
          "message": "libbpf: extern 'bpf_testmod_fentry_test1' (strong): not resolved\nlibbpf: failed to load object 'kprobe_multi'\nlibbpf: failed to load BPF skeleton 'kprobe_multi': -3\nkprobe_multi_attach_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3\n",
          "failed": true
        },
        {
          "name": "lsm",
          "number": 8,
          "message": "lsm_subtest:PASS:lsm.link_create 0 nsec\nlsm_subtest:FAIL:stack_mprotect unexpected stack_mprotect: actual 0 != expected -1\n",
          "failed": true
        }
```

The file can then be used to print a summary of the test run and list of
failing tests/subtests:

```
$ jq -r < /tmp/test_progs.json '"Success: \(.success)/\(.success_subtest), Skipped: \(.skipped), Failed: \(.failed)"'

Success: 29/23, Skipped: 3, Failed: 28
$ jq -r < /tmp/test_progs.json '.results | map([
    if .failed then "#\(.number) \(.name)" else empty end,
    (
        . as {name: $tname, number: $tnum} | .subtests | map(
            if .failed then "#\($tnum)/\(.number) \($tname)/\(.name)" else empty end
        )
    )
]) | flatten | .[]' | head -n 20
#10 bpf_cookie
#10/2 bpf_cookie/multi_kprobe_link_api
#10/3 bpf_cookie/multi_kprobe_attach_api
#10/8 bpf_cookie/lsm
#15 bpf_mod_race
#15/1 bpf_mod_race/ksym (used_btfs UAF)
#15/2 bpf_mod_race/kfunc (kfunc_btf_tab UAF)
#36 cgroup_hierarchical_stats
#61 deny_namespace
#61/1 deny_namespace/unpriv_userns_create_no_bpf
#73 fexit_stress
#83 get_func_ip_test
#99 kfunc_dynptr_param
#99/1 kfunc_dynptr_param/dynptr_data_null
#99/4 kfunc_dynptr_param/dynptr_data_null
#100 kprobe_multi_bench_attach
#100/1 kprobe_multi_bench_attach/kernel
#100/2 kprobe_multi_bench_attach/modules
#101 kprobe_multi_test
#101/1 kprobe_multi_test/skel_api
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317163256.3809328-1-chantr4@gmail.com

commit | commitdiff | tree

Andrii Nakryiko [Fri, 17 Mar 2023 22:44:27 +0000 (15:44 -0700)]

Merge branch 'bpf: Add detection of kfuncs.'

Alexei Starovoitov says:

====================

From: Alexei Starovoitov <ast@kernel.org>

Allow BPF programs detect at load time whether particular kfunc exists.

Patch 1: Allow ld_imm64 to point to kfunc in the kernel.
Patch 2: Fix relocation of kfunc in ld_imm64 insn when kfunc is in kernel module.
Patch 3: Introduce bpf_ksym_exists() macro.
Patch 4: selftest.

NOTE: detection of kfuncs from light skeleton is not supported yet.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Fri, 17 Mar 2023 20:19:20 +0000 (13:19 -0700)]

selftests/bpf: Add test for bpf_ksym_exists().

Add load and run time test for bpf_ksym_exists() and check that the verifier
performs dead code elimination for non-existing kfunc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230317201920.62030-5-alexei.starovoitov@gmail.com

commit | commitdiff | tree

Alexei Starovoitov [Fri, 17 Mar 2023 20:19:19 +0000 (13:19 -0700)]

libbpf: Introduce bpf_ksym_exists() macro.

Introduce bpf_ksym_exists() macro that can be used by BPF programs
to detect at load time whether particular ksym (either variable or kfunc)
is present in the kernel.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317201920.62030-4-alexei.starovoitov@gmail.com

commit | commitdiff | tree

Alexei Starovoitov [Fri, 17 Mar 2023 20:19:18 +0000 (13:19 -0700)]

libbpf: Fix relocation of kfunc ksym in ld_imm64 insn.

void *p = kfunc; -> generates ld_imm64 insn.
kfunc() -> generates bpf_call insn.

libbpf patches bpf_call insn correctly while only btf_id part of ld_imm64 is
set in the former case. Which means that pointers to kfuncs in modules are not
patched correctly and the verifier rejects load of such programs due to btf_id
being out of range. Fix libbpf to patch ld_imm64 for kfunc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317201920.62030-3-alexei.starovoitov@gmail.com

Domain: System / Kernel;

RSS Atom