net_sched: get rid of rcu_barrier() in tcf_block_put_ext()
authorCong Wang <xiyou.wangcong@gmail.com>
Mon, 4 Dec 2017 18:48:18 +0000 (10:48 -0800)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Sat, 3 Mar 2018 09:24:39 +0000 (10:24 +0100)
commitac2be03ba64febc467e2df4fceb2a053408cc833
tree40a6f4c8ab234cf52b3317c9d060adad1c07bb7e
parent1c8e7e61cbdf8792cef437afe5c5cdb26f7773df
net_sched: get rid of rcu_barrier() in tcf_block_put_ext()

commit efbf78973978b0d25af59bc26c8013a942af6e64 upstream.

Both Eric and Paolo noticed the rcu_barrier() we use in
tcf_block_put_ext() could be a performance bottleneck when
we have a lot of tc classes.

Paolo provided the following to demonstrate the issue:

tc qdisc add dev lo root htb
for I in `seq 1 1000`; do
        tc class add dev lo parent 1: classid 1:$I htb rate 100kbit
        tc qdisc add dev lo parent 1:$I handle $((I + 1)): htb
        for J in `seq 1 10`; do
                tc filter add dev lo parent $((I + 1)): u32 match ip src 1.1.1.$J
        done
done
time tc qdisc del dev root

real    0m54.764s
user    0m0.023s
sys     0m0.000s

The rcu_barrier() there is to ensure we free the block after all chains
are gone, that is, to queue tcf_block_put_final() at the tail of workqueue.
We can achieve this ordering requirement by refcnt'ing tcf block instead,
that is, the tcf block is freed only when the last chain in this block is
gone. This also simplifies the code.

Paolo reported after this patch we get:

real    0m0.017s
user    0m0.000s
sys     0m0.017s

Tested-by: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
include/net/sch_generic.h
net/sched/cls_api.c