examples/c: add hashing and naive substring search algo#331
examples/c: add hashing and naive substring search algo#331anakryiko wants to merge 1 commit intolibbpf:masterfrom
Conversation
f849345 to
3075498
Compare
There was a problem hiding this comment.
Hi, Andrii. I see some annotations in the prog like __arg_nonnull, does this help compiler or verifier to optimize their process ?
There was a problem hiding this comment.
__arg_nonnull is an annotation that can be applied to arguments of global subprog (which is verified by BPF verifier in isolation from main program, based on functions' type signature; so it's a more restricted way to verify, but also allows to scale BPF verification much better, as we create a smaller isolated pieces of logic that BPF verifier won't have to re-validate every single time). It tells BPF verifier that this argument can't be NULL. This will be assumed by verifier when validating the body of that subprogram, but also enforced by verifier when other code calls into this subprogram.
Hope this helps.
3075498 to
e59471e
Compare
Also benchmark it a little. Performance obviously will depend on haystack and needle strings and so on, but hashing implementation seems to be on par with naive implementation for short strings, but is getting relatively faster as strings become longer and/or pattern match happens further into the string. E.g., for searching "ra" in "abracadabra" (end of short string): substr-2084331 [012] ..... 2514091.887184: bpf_trace_printk: BENCH HASHED 156 ns/iter substr-2084331 [012] ..... 2514091.891784: bpf_trace_printk: BENCH NAIVE 183 ns/iter For searching "eaba" in "abacabadabacabaeabacabadabacaba" (middle of longer string): substr-2082624 [015] ..... 2514066.577106: bpf_trace_printk: BENCH HASHED 289 ns/iter substr-2082624 [015] ..... 2514066.588243: bpf_trace_printk: BENCH NAIVE 445 ns/iter But searching all occurences of "a" inside "abracadabra" (almost immediate match in rather short string): substr-2111313 [078] ..... 2514466.822019: bpf_trace_printk: BENCH HASHED 259 ns/iter substr-2111313 [078] ..... 2514466.827745: bpf_trace_printk: BENCH NAIVE 228 ns/iter Overall, hashed variant seems best from practical point of view. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
e59471e to
a4d665a
Compare
Also benchmark it a little. Performance obviously will depend on haystack and needle strings and so on, but hashing implementation seems to be on par with naive implementation for short strings, but is getting relatively faster as strings become longer and/or pattern match happens further into the string.
E.g., for searching "ra" in "abracadabra" (end of short string):
substr-2084331 [012] ..... 2514091.887184: bpf_trace_printk: BENCH HASHED 156 ns/iter
substr-2084331 [012] ..... 2514091.891784: bpf_trace_printk: BENCH NAIVE 183 ns/iter
For searching "eaba" in "abacabadabacabaeabacabadabacaba" (middle of longer string):
substr-2082624 [015] ..... 2514066.577106: bpf_trace_printk: BENCH HASHED 289 ns/iter
substr-2082624 [015] ..... 2514066.588243: bpf_trace_printk: BENCH NAIVE 445 ns/iter
But searching all occurences of "a" inside "abracadabra" (almost immediate match in rather short string):
substr-2111313 [078] ..... 2514466.822019: bpf_trace_printk: BENCH HASHED 259 ns/iter
substr-2111313 [078] ..... 2514466.827745: bpf_trace_printk: BENCH NAIVE 228 ns/iter
Overall, hashed variant seems best from practical point of view.