tracing/filter: Call filter predicate functions directly via a switch statement
Due to retpolines, indirect calls are much more expensive than direct
calls. The filters have a select set of functions it uses for the
predicates. Instead of using function pointers to call them, create a
filter_pred_fn_call() function that uses a switch statement to call the
predicate functions directly. This gives almost a 10% speedup to the
filter logic.
Using the histogram benchmark:
Before:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 113 } hitcount: 272
{ delta: 114 } hitcount: 840
{ delta: 118 } hitcount: 344
{ delta: 119 } hitcount: 25428
{ delta: 120 } hitcount: 350590
{ delta: 121 } hitcount:
1892484
{ delta: 122 } hitcount:
6205004
{ delta: 123 } hitcount:
11583521
{ delta: 124 } hitcount:
37590979
{ delta: 125 } hitcount:
108308504
{ delta: 126 } hitcount:
131672461
{ delta: 127 } hitcount:
88700598
{ delta: 128 } hitcount:
65939870
{ delta: 129 } hitcount:
45055004
{ delta: 130 } hitcount:
33174464
{ delta: 131 } hitcount:
31813493
{ delta: 132 } hitcount:
29011676
{ delta: 133 } hitcount:
22798782
{ delta: 134 } hitcount:
22072486
{ delta: 135 } hitcount:
17034113
{ delta: 136 } hitcount:
8982490
{ delta: 137 } hitcount:
2865908
{ delta: 138 } hitcount: 980382
{ delta: 139 } hitcount:
1651944
{ delta: 140 } hitcount:
4112073
{ delta: 141 } hitcount:
3963269
{ delta: 142 } hitcount:
1712508
{ delta: 143 } hitcount: 575941
After:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 103 } hitcount: 60
{ delta: 104 } hitcount: 16966
{ delta: 105 } hitcount: 396625
{ delta: 106 } hitcount:
3223400
{ delta: 107 } hitcount:
12053754
{ delta: 108 } hitcount:
20241711
{ delta: 109 } hitcount:
14850200
{ delta: 110 } hitcount:
4946599
{ delta: 111 } hitcount:
3479315
{ delta: 112 } hitcount:
18698299
{ delta: 113 } hitcount:
62388733
{ delta: 114 } hitcount:
95803834
{ delta: 115 } hitcount:
58278130
{ delta: 116 } hitcount:
15364800
{ delta: 117 } hitcount:
5586866
{ delta: 118 } hitcount:
2346880
{ delta: 119 } hitcount:
1131091
{ delta: 120 } hitcount: 620896
{ delta: 121 } hitcount: 236652
{ delta: 122 } hitcount: 105957
{ delta: 123 } hitcount: 119107
{ delta: 124 } hitcount: 54494
{ delta: 125 } hitcount: 63856
{ delta: 126 } hitcount: 64454
{ delta: 127 } hitcount: 34818
{ delta: 128 } hitcount: 41446
{ delta: 129 } hitcount: 51242
{ delta: 130 } hitcount: 28361
{ delta: 131 } hitcount: 23926
The peak before was 126ns per event, after the peak is 114ns, and the
fastest time went from 113ns to 103ns.
Link: https://lkml.kernel.org/r/20220906225529.781407172@goodmis.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>