bpf: Migrate bpf_rbtree_remove to possibly fail
authorDave Marchevsky <davemarchevsky@fb.com>
Sat, 15 Apr 2023 20:18:09 +0000 (13:18 -0700)
committerAlexei Starovoitov <ast@kernel.org>
Sun, 16 Apr 2023 00:36:50 +0000 (17:36 -0700)
This patch modifies bpf_rbtree_remove to account for possible failure
due to the input rb_node already not being in any collection.
The function can now return NULL, and does when the aforementioned
scenario occurs. As before, on successful removal an owning reference to
the removed node is returned.

Adding KF_RET_NULL to bpf_rbtree_remove's kfunc flags - now KF_RET_NULL |
KF_ACQUIRE - provides the desired verifier semantics:

  * retval must be checked for NULL before use
  * if NULL, retval's ref_obj_id is released
  * retval is a "maybe acquired" owning ref, not a non-owning ref,
    so it will live past end of critical section (bpf_spin_unlock), and
    thus can be checked for NULL after the end of the CS

BPF programs must add checks
============================

This does change bpf_rbtree_remove's verifier behavior. BPF program
writers will need to add NULL checks to their programs, but the
resulting UX looks natural:

  bpf_spin_lock(&glock);

  n = bpf_rbtree_first(&ghead);
  if (!n) { /* ... */}
  res = bpf_rbtree_remove(&ghead, &n->node);

  bpf_spin_unlock(&glock);

  if (!res)  /* Newly-added check after this patch */
    return 1;

  n = container_of(res, /* ... */);
  /* Do something else with n */
  bpf_obj_drop(n);
  return 0;

The "if (!res)" check above is the only addition necessary for the above
program to pass verification after this patch.

bpf_rbtree_remove no longer clobbers non-owning refs
====================================================

An issue arises when bpf_rbtree_remove fails, though. Consider this
example:

  struct node_data {
    long key;
    struct bpf_list_node l;
    struct bpf_rb_node r;
    struct bpf_refcount ref;
  };

  long failed_sum;

  void bpf_prog()
  {
    struct node_data *n = bpf_obj_new(/* ... */);
    struct bpf_rb_node *res;
    n->key = 10;

    bpf_spin_lock(&glock);

    bpf_list_push_back(&some_list, &n->l); /* n is now a non-owning ref */
    res = bpf_rbtree_remove(&some_tree, &n->r, /* ... */);
    if (!res)
      failed_sum += n->key;  /* not possible */

    bpf_spin_unlock(&glock);
    /* if (res) { do something useful and drop } ... */
  }

The bpf_rbtree_remove in this example will always fail. Similarly to
bpf_spin_unlock, bpf_rbtree_remove is a non-owning reference
invalidation point. The verifier clobbers all non-owning refs after a
bpf_rbtree_remove call, so the "failed_sum += n->key" line will fail
verification, and in fact there's no good way to get information about
the node which failed to add after the invalidation. This patch removes
non-owning reference invalidation from bpf_rbtree_remove to allow the
above usecase to pass verification. The logic for why this is now
possible is as follows:

Before this series, bpf_rbtree_add couldn't fail and thus assumed that
its input, a non-owning reference, was in the tree. But it's easy to
construct an example where two non-owning references pointing to the same
underlying memory are acquired and passed to rbtree_remove one after
another (see rbtree_api_release_aliasing in
selftests/bpf/progs/rbtree_fail.c).

So it was necessary to clobber non-owning refs to prevent this
case and, more generally, to enforce "non-owning ref is definitely
in some collection" invariant. This series removes that invariant and
the failure / runtime checking added in this patch provide a clean way
to deal with the aliasing issue - just fail to remove.

Because the aliasing issue prevented by clobbering non-owning refs is no
longer an issue, this patch removes the invalidate_non_owning_refs
call from verifier handling of bpf_rbtree_remove. Note that
bpf_spin_unlock - the other caller of invalidate_non_owning_refs -
clobbers non-owning refs for a different reason, so its clobbering
behavior remains unchanged.

No BPF program changes are necessary for programs to remain valid as a
result of this clobbering change. A valid program before this patch
passed verification with its non-owning refs having shorter (or equal)
lifetimes due to more aggressive clobbering.

Also, update existing tests to check bpf_rbtree_remove retval for NULL
where necessary, and move rbtree_api_release_aliasing from
progs/rbtree_fail.c to progs/rbtree.c since it's now expected to pass
verification.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-8-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
kernel/bpf/btf.c
kernel/bpf/helpers.c
kernel/bpf/verifier.c
tools/testing/selftests/bpf/prog_tests/linked_list.c
tools/testing/selftests/bpf/prog_tests/rbtree.c
tools/testing/selftests/bpf/progs/rbtree.c
tools/testing/selftests/bpf/progs/rbtree_fail.c

index 14889fd5ba8e38b8f0e763f8524c3ecc1dfc279d..027f9f8a355169ef798390b6ccc8070d2793fb34 100644 (file)
@@ -3805,25 +3805,8 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
                goto end;
        }
 
-       /* need collection identity for non-owning refs before allowing this
-        *
-        * Consider a node type w/ both list and rb_node fields:
-        *   struct node {
-        *     struct bpf_list_node l;
-        *     struct bpf_rb_node r;
-        *   }
-        *
-        * Used like so:
-        *   struct node *n = bpf_obj_new(....);
-        *   bpf_list_push_front(&list_head, &n->l);
-        *   bpf_rbtree_remove(&rb_root, &n->r);
-        *
-        * It should not be possible to rbtree_remove the node since it hasn't
-        * been added to a tree. But push_front converts n to a non-owning
-        * reference, and rbtree_remove accepts the non-owning reference to
-        * a type w/ bpf_rb_node field.
-        */
-       if (btf_record_has_field(rec, BPF_LIST_NODE) &&
+       if (rec->refcount_off < 0 &&
+           btf_record_has_field(rec, BPF_LIST_NODE) &&
            btf_record_has_field(rec, BPF_RB_NODE)) {
                ret = -EINVAL;
                goto end;
index 5067f8d468728aa6cce353a1f8e50555c184d2be..1835df33328719958ce3b36560c8df391e3b5d0a 100644 (file)
@@ -2000,6 +2000,12 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
        struct rb_root_cached *r = (struct rb_root_cached *)root;
        struct rb_node *n = (struct rb_node *)node;
 
+       if (!n->__rb_parent_color)
+               RB_CLEAR_NODE(n);
+
+       if (RB_EMPTY_NODE(n))
+               return NULL;
+
        rb_erase_cached(n, r);
        RB_CLEAR_NODE(n);
        return (struct bpf_rb_node *)n;
@@ -2328,7 +2334,7 @@ BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
-BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE)
+BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_rbtree_add_impl)
 BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
 
index 736cb7cec0bd395574871d119acded4e0716b20d..6a41b69a424ea7069f89577f659add1f90b9f59f 100644 (file)
@@ -10922,9 +10922,6 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
                        ref_set_non_owning(env, &regs[BPF_REG_0]);
                }
 
-               if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove])
-                       invalidate_non_owning_refs(env);
-
                if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
                        regs[BPF_REG_0].id = ++env->id_gen;
        } else if (btf_type_is_void(t)) {
index 872e4bd500fdb1c406270c3002d77c7df685a70c..f63309fd0e28e2ca4db524172e3489b831ca54f0 100644 (file)
@@ -266,6 +266,59 @@ end:
        return NULL;
 }
 
+static void list_and_rb_node_same_struct(bool refcount_field)
+{
+       int bpf_rb_node_btf_id, bpf_refcount_btf_id, foo_btf_id;
+       struct btf *btf;
+       int id, err;
+
+       btf = init_btf();
+       if (!ASSERT_OK_PTR(btf, "init_btf"))
+               return;
+
+       bpf_rb_node_btf_id = btf__add_struct(btf, "bpf_rb_node", 24);
+       if (!ASSERT_GT(bpf_rb_node_btf_id, 0, "btf__add_struct bpf_rb_node"))
+               return;
+
+       if (refcount_field) {
+               bpf_refcount_btf_id = btf__add_struct(btf, "bpf_refcount", 4);
+               if (!ASSERT_GT(bpf_refcount_btf_id, 0, "btf__add_struct bpf_refcount"))
+                       return;
+       }
+
+       id = btf__add_struct(btf, "bar", refcount_field ? 44 : 40);
+       if (!ASSERT_GT(id, 0, "btf__add_struct bar"))
+               return;
+       err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
+       if (!ASSERT_OK(err, "btf__add_field bar::a"))
+               return;
+       err = btf__add_field(btf, "c", bpf_rb_node_btf_id, 128, 0);
+       if (!ASSERT_OK(err, "btf__add_field bar::c"))
+               return;
+       if (refcount_field) {
+               err = btf__add_field(btf, "ref", bpf_refcount_btf_id, 320, 0);
+               if (!ASSERT_OK(err, "btf__add_field bar::ref"))
+                       return;
+       }
+
+       foo_btf_id = btf__add_struct(btf, "foo", 20);
+       if (!ASSERT_GT(foo_btf_id, 0, "btf__add_struct foo"))
+               return;
+       err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+       if (!ASSERT_OK(err, "btf__add_field foo::a"))
+               return;
+       err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0);
+       if (!ASSERT_OK(err, "btf__add_field foo::b"))
+               return;
+       id = btf__add_decl_tag(btf, "contains:bar:a", foo_btf_id, 0);
+       if (!ASSERT_GT(id, 0, "btf__add_decl_tag contains:bar:a"))
+               return;
+
+       err = btf__load_into_kernel(btf);
+       ASSERT_EQ(err, refcount_field ? 0 : -EINVAL, "check btf");
+       btf__free(btf);
+}
+
 static void test_btf(void)
 {
        struct btf *btf = NULL;
@@ -717,39 +770,12 @@ static void test_btf(void)
        }
 
        while (test__start_subtest("btf: list_node and rb_node in same struct")) {
-               btf = init_btf();
-               if (!ASSERT_OK_PTR(btf, "init_btf"))
-                       break;
-
-               id = btf__add_struct(btf, "bpf_rb_node", 24);
-               if (!ASSERT_EQ(id, 5, "btf__add_struct bpf_rb_node"))
-                       break;
-               id = btf__add_struct(btf, "bar", 40);
-               if (!ASSERT_EQ(id, 6, "btf__add_struct bar"))
-                       break;
-               err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
-               if (!ASSERT_OK(err, "btf__add_field bar::a"))
-                       break;
-               err = btf__add_field(btf, "c", 5, 128, 0);
-               if (!ASSERT_OK(err, "btf__add_field bar::c"))
-                       break;
-
-               id = btf__add_struct(btf, "foo", 20);
-               if (!ASSERT_EQ(id, 7, "btf__add_struct foo"))
-                       break;
-               err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
-               if (!ASSERT_OK(err, "btf__add_field foo::a"))
-                       break;
-               err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0);
-               if (!ASSERT_OK(err, "btf__add_field foo::b"))
-                       break;
-               id = btf__add_decl_tag(btf, "contains:bar:a", 7, 0);
-               if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:bar:a"))
-                       break;
+               list_and_rb_node_same_struct(true);
+               break;
+       }
 
-               err = btf__load_into_kernel(btf);
-               ASSERT_EQ(err, -EINVAL, "check btf");
-               btf__free(btf);
+       while (test__start_subtest("btf: list_node and rb_node in same struct, no bpf_refcount")) {
+               list_and_rb_node_same_struct(false);
                break;
        }
 }
index 156fa95c42f61842856ced0649b56330e8ff43bd..e9300c96607d3148c2ffc9a241390bb490495f93 100644 (file)
@@ -77,6 +77,29 @@ static void test_rbtree_first_and_remove(void)
        rbtree__destroy(skel);
 }
 
+static void test_rbtree_api_release_aliasing(void)
+{
+       LIBBPF_OPTS(bpf_test_run_opts, opts,
+                   .data_in = &pkt_v4,
+                   .data_size_in = sizeof(pkt_v4),
+                   .repeat = 1,
+       );
+       struct rbtree *skel;
+       int ret;
+
+       skel = rbtree__open_and_load();
+       if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+               return;
+
+       ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_api_release_aliasing), &opts);
+       ASSERT_OK(ret, "rbtree_api_release_aliasing");
+       ASSERT_OK(opts.retval, "rbtree_api_release_aliasing retval");
+       ASSERT_EQ(skel->data->first_data[0], 42, "rbtree_api_release_aliasing first rbtree_remove()");
+       ASSERT_EQ(skel->data->first_data[1], -1, "rbtree_api_release_aliasing second rbtree_remove()");
+
+       rbtree__destroy(skel);
+}
+
 void test_rbtree_success(void)
 {
        if (test__start_subtest("rbtree_add_nodes"))
@@ -85,6 +108,8 @@ void test_rbtree_success(void)
                test_rbtree_add_and_remove();
        if (test__start_subtest("rbtree_first_and_remove"))
                test_rbtree_first_and_remove();
+       if (test__start_subtest("rbtree_api_release_aliasing"))
+               test_rbtree_api_release_aliasing();
 }
 
 #define BTF_FAIL_TEST(suffix)                                                                  \
index 4c90aa6abddd95087fb8e6d4b75d99a026ee08fc..b09f4fffe57cfde7b42831462371643055f35b9f 100644 (file)
@@ -93,9 +93,11 @@ long rbtree_add_and_remove(void *ctx)
        res = bpf_rbtree_remove(&groot, &n->node);
        bpf_spin_unlock(&glock);
 
+       if (!res)
+               return 1;
+
        n = container_of(res, struct node_data, node);
        removed_key = n->key;
-
        bpf_obj_drop(n);
 
        return 0;
@@ -148,9 +150,11 @@ long rbtree_first_and_remove(void *ctx)
        res = bpf_rbtree_remove(&groot, &o->node);
        bpf_spin_unlock(&glock);
 
+       if (!res)
+               return 5;
+
        o = container_of(res, struct node_data, node);
        removed_key = o->key;
-
        bpf_obj_drop(o);
 
        bpf_spin_lock(&glock);
@@ -173,4 +177,70 @@ err_out:
        return 1;
 }
 
+SEC("tc")
+long rbtree_api_release_aliasing(void *ctx)
+{
+       struct node_data *n, *m, *o;
+       struct bpf_rb_node *res, *res2;
+
+       n = bpf_obj_new(typeof(*n));
+       if (!n)
+               return 1;
+       n->key = 41;
+       n->data = 42;
+
+       bpf_spin_lock(&glock);
+       bpf_rbtree_add(&groot, &n->node, less);
+       bpf_spin_unlock(&glock);
+
+       bpf_spin_lock(&glock);
+
+       /* m and o point to the same node,
+        * but verifier doesn't know this
+        */
+       res = bpf_rbtree_first(&groot);
+       if (!res)
+               goto err_out;
+       o = container_of(res, struct node_data, node);
+
+       res = bpf_rbtree_first(&groot);
+       if (!res)
+               goto err_out;
+       m = container_of(res, struct node_data, node);
+
+       res = bpf_rbtree_remove(&groot, &m->node);
+       /* Retval of previous remove returns an owning reference to m,
+        * which is the same node non-owning ref o is pointing at.
+        * We can safely try to remove o as the second rbtree_remove will
+        * return NULL since the node isn't in a tree.
+        *
+        * Previously we relied on the verifier type system + rbtree_remove
+        * invalidating non-owning refs to ensure that rbtree_remove couldn't
+        * fail, but now rbtree_remove does runtime checking so we no longer
+        * invalidate non-owning refs after remove.
+        */
+       res2 = bpf_rbtree_remove(&groot, &o->node);
+
+       bpf_spin_unlock(&glock);
+
+       if (res) {
+               o = container_of(res, struct node_data, node);
+               first_data[0] = o->data;
+               bpf_obj_drop(o);
+       }
+       if (res2) {
+               /* The second remove fails, so res2 is null and this doesn't
+                * execute
+                */
+               m = container_of(res2, struct node_data, node);
+               first_data[1] = m->data;
+               bpf_obj_drop(m);
+       }
+       return 0;
+
+err_out:
+       bpf_spin_unlock(&glock);
+       return 1;
+}
+
 char _license[] SEC("license") = "GPL";
index 46d7d18a218fd3e1607d309ce2766ce57001c069..3fecf1c6dfe57da96b80f6f60a2079d0fd5b4aec 100644 (file)
@@ -105,7 +105,7 @@ long rbtree_api_remove_unadded_node(void *ctx)
 }
 
 SEC("?tc")
-__failure __msg("Unreleased reference id=2 alloc_insn=10")
+__failure __msg("Unreleased reference id=3 alloc_insn=10")
 long rbtree_api_remove_no_drop(void *ctx)
 {
        struct bpf_rb_node *res;
@@ -118,11 +118,13 @@ long rbtree_api_remove_no_drop(void *ctx)
 
        res = bpf_rbtree_remove(&groot, res);
 
-       n = container_of(res, struct node_data, node);
-       __sink(n);
+       if (res) {
+               n = container_of(res, struct node_data, node);
+               __sink(n);
+       }
        bpf_spin_unlock(&glock);
 
-       /* bpf_obj_drop(n) is missing here */
+       /* if (res) { bpf_obj_drop(n); } is missing here */
        return 0;
 
 unlock_err:
@@ -150,35 +152,36 @@ long rbtree_api_add_to_multiple_trees(void *ctx)
 }
 
 SEC("?tc")
-__failure __msg("rbtree_remove node input must be non-owning ref")
-long rbtree_api_add_release_unlock_escape(void *ctx)
+__failure __msg("dereference of modified ptr_or_null_ ptr R2 off=16 disallowed")
+long rbtree_api_use_unchecked_remove_retval(void *ctx)
 {
-       struct node_data *n;
-
-       n = bpf_obj_new(typeof(*n));
-       if (!n)
-               return 1;
+       struct bpf_rb_node *res;
 
        bpf_spin_lock(&glock);
-       bpf_rbtree_add(&groot, &n->node, less);
+
+       res = bpf_rbtree_first(&groot);
+       if (!res)
+               goto err_out;
+       res = bpf_rbtree_remove(&groot, res);
+
        bpf_spin_unlock(&glock);
 
        bpf_spin_lock(&glock);
-       /* After add() in previous critical section, n should be
-        * release_on_unlock and released after previous spin_unlock,
-        * so should not be possible to use it here
-        */
-       bpf_rbtree_remove(&groot, &n->node);
+       /* Must check res for NULL before using in rbtree_add below */
+       bpf_rbtree_add(&groot, res, less);
        bpf_spin_unlock(&glock);
        return 0;
+
+err_out:
+       bpf_spin_unlock(&glock);
+       return 1;
 }
 
 SEC("?tc")
 __failure __msg("rbtree_remove node input must be non-owning ref")
-long rbtree_api_release_aliasing(void *ctx)
+long rbtree_api_add_release_unlock_escape(void *ctx)
 {
-       struct node_data *n, *m, *o;
-       struct bpf_rb_node *res;
+       struct node_data *n;
 
        n = bpf_obj_new(typeof(*n));
        if (!n)
@@ -189,37 +192,11 @@ long rbtree_api_release_aliasing(void *ctx)
        bpf_spin_unlock(&glock);
 
        bpf_spin_lock(&glock);
-
-       /* m and o point to the same node,
-        * but verifier doesn't know this
-        */
-       res = bpf_rbtree_first(&groot);
-       if (!res)
-               return 1;
-       o = container_of(res, struct node_data, node);
-
-       res = bpf_rbtree_first(&groot);
-       if (!res)
-               return 1;
-       m = container_of(res, struct node_data, node);
-
-       bpf_rbtree_remove(&groot, &m->node);
-       /* This second remove shouldn't be possible. Retval of previous
-        * remove returns owning reference to m, which is the same
-        * node o's non-owning ref is pointing at
-        *
-        * In order to preserve property
-        *   * owning ref must not be in rbtree
-        *   * non-owning ref must be in rbtree
-        *
-        * o's ref must be invalidated after previous remove. Otherwise
-        * we'd have non-owning ref to node that isn't in rbtree, and
-        * verifier wouldn't be able to use type system to prevent remove
-        * of ref that already isn't in any tree. Would have to do runtime
-        * checks in that case.
+       /* After add() in previous critical section, n should be
+        * release_on_unlock and released after previous spin_unlock,
+        * so should not be possible to use it here
         */
-       bpf_rbtree_remove(&groot, &o->node);
-
+       bpf_rbtree_remove(&groot, &n->node);
        bpf_spin_unlock(&glock);
        return 0;
 }