-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Convert string and char_buf matches to hash look ups #1408
Conversation
cf91e38
to
e5890d1
Compare
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
5230a73
to
4e4f713
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Had a look at the first patch. The idea of using hash maps makes perfect sense to me, so thanks for taking this on.
One thing I'm not so sure about is whether it is worth the complexity of having multiple maps instead of one. As far as I understand the motivation here is to save space. Could you provide some examples of how much space we would save using this approach? I think it depends on the use-case but my feeling is that if we are talking about 10-20 string values, then the difference is not that big.
In any case, I would suggest that the keep the go structures simpler than what we do in bpf, and then deal with the exact structure of the bpf maps when we load the data.
The motivation is to save CPU cycles in the look up code. If we only have one hash map with a key size of (say) 128 bytes, then every string we try to look up will require hashing 128 bytes, regardless if many of them are 0x00. Using several maps with different size keys means we only hash an amount of bytes that is closer to the string length we wish to look up. |
The problem with keeping the go structures simpler is that we need to know the array indices of the hash maps at the point that we build the kernel selectors. The easiest way to know these is to make lists of hash maps that mirror those in BPF. That said, some level of harmonisation is possible, which I shall do. |
During offline sync we discussed quickly why the different maps, and to me it is fine. Maybe it is complex from readability, but from the performance view even if we don't have numbers, but the fact that 1. these selectors where not working properly and now they are being fixed, and 2. file name and string matching is the most basic and easy use case. Users may watch different paths then having better benchmark results on top is always good. |
c57141f
to
503053e
Compare
(1) and (2) make sense to me, but they could also be implemented with a single map for all sizes. Having different maps for different sizes is a performance optimization (for CPU cycles as you mention above), and, given the code complexity it introduces, I think it's reasonable to ask: is it worth it? How much time do we save compared to when we use a single map for all sizes? |
Jhash cost is a little over a CPU cycle per length of data to hash in bytes, but we could say that is equal; i.e. hashing 128 bytes costs 128 CPU cycles (see https://fosdem.org/2023/schedule/event/bpf_hashing/ for details/confirmation). The existing byte-by-byte comparison has a cost of CPU cycles that similarly grows with the length of the data to compare. While fixing (1) and (2) above, we also don't want to incur extra CPU cost if we can avoid it. In this situation, we want to give the users the ability to specify fairly long strings/filenames to match on – say ~128 bytes. If we take the simplistic approach of a single map, then all look ups will require hashing 128 bytes, with a cost of ~128 CPU cycles. By dividing the strings into different maps, we reduce this cost to approximately the size of the key. Therefore, assuming many look up strings are less than 24 byte in length, many look ups would cost 25 CPU cycles compared with 128 CPU cycles. We could assume that the next biggest category would be slightly longer strings, with these costing 49 CPU cycles compared to 128 CPU cycles. Compared to these gains, the code isn't really that complex. Rather than putting all the strings in the same map, they are simply divided into six maps based on their length; and then at look up, the appropriate map is chosen based on the length of the look up string. An advantage of this approach is that we could increase the maximum string length to, say, 1024 bytes with no additional CPU cost – there may be a small memory cost, however. Given that Tetragon hooks kernel functions and performs comparisons during many of the hooks, we always cost a system CPU time. Looking at Sysmon (Windows) as an example of a tool with millions of installs, many of the use cases feature string and filename matching. IMO it's not a case of "Is it worth it?" but more a case of "How could we justify not doing it?" |
I've taken some measurements using 50,000,000 calls to a kprobe:
Objectively, the hash for a single map is more than twice the cost of the hash for 6 maps. This is admittedly very small compared to the total cost of the kprobe (2.92us), and represents an increase of 2.3%. |
I'm getting following error for spec below, which I assume should work now
it seems there's double values iteration, first in writeMatchValues and then in writeMatchStrings,
|
That's odd. I'll take a look. Thanks for trying it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think after slowly reviewing this I learned many things, thanks, that's a really cool use of BPF_MAP_TYPE_LPM_TRIE
! I must admit the multiple maps + padding stuff makes everything a bit more complex and hard to read indeed. I wonder after your benchmarks if you want to remove that optimization or not.
Anyway, that would be helpful as well for #1278, as of now we use sets as well for In
and NotIn
that we could replace with those already implemented operators. Paths will be similar to string or char_buf.
I made some annoying remarks because it was confusing/or just convention, feel free to ignore them, just sayin.
It should be good to do a pass on the documentation as well (after that PR) to remove the trailing \0
, even if they are harmless (thanks to your fix), they are confusing:
- - "/etc/passwd\0"
+ - "/etc/passwd"
503053e
to
71ed2ff
Compare
Yep that's what it was – I fixed it in the last push. When I simplified the code I moved writeMatchStrings into writeMatchValues as I thought it made more sense there, but this introduced the double iteration. All looks good now. Regarding mapid/map_idx: consider just the smallest key'd map (key size is 25 bytes). A single kprobe could have more than one string Equal match section – maybe for different parameters or different actions. Therefore there could be 2 or more hash tables of strings, one for each match section. Each of these is placed inside an array_of_hash map at indices 0, 1, 2... These indices are then stored in the kernel selectors, so each match knows which hash table to use. But of course there are six key sizes, so therefore six array_of_hash maps, and therefore the above process six times. In the selector we store the index for the relevant hash map in the first array_of_hash, then the index for the relevant hash map in the second array_of_hash, and so on for all six. In BPF, we take the input string, work out which key size it would fall into, get the index from the selector for that hash map, and look it up in the relevant array_of_hash. Then we can pad our string to that key size and look up the string in that hash map. HTH |
71ed2ff
to
923dd6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see more tests given the size of new code (also for prefix/postfix),
but I guess that can be added later, anyway it looks good to me now
Thanks for the approval. My thought about tests was that this isn't new functionality, so while more tests might be a good idea, we were already okay with the testing of the existing implementation (which happened to have bugs that I fixed a little while back!). |
yep understandable, it would be nice though to add some tests on the limits, like the one Jiri proposed #1408 (comment). But as he said, we can add them later. |
I added tests to the first commit for the 6 map string look up, all focused on the limits. Prefix and postfix could potentially benefit from tests at the max string length, but I don't feel they are urgent. The fact that tests already exist for them (from the previous implementation) satisfies me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks good to me.
I have some suggestions, but I feel the PR is definitely an improvement from the current state of things, so I'm happy to merge it as is. The only thing I would like to clarify before merging is the numbers. First, thanks for measuring! The summary seems to be that this gives us a 2.3% improvement, which is small but an improvement nevertheless.
I'm not sure I understand the numbers though.
I've taken some measurements using 50,000,000 calls to a kprobe:
The difference between no hash and the hash based on 6 maps is 60.2ns/kprobe.
The difference between no hash and the hash based on a single map is 127.6ns/kprobe.
The difference between the hash based on 6 maps and the hash based on a single map is 67.4ns/kprobe.
Objectively, the hash for a single map is more than twice the cost of the hash for 6 maps. This is admittedly very small compared to the total cost of the kprobe (2.92us), and represents an increase of 2.3%.
For example:
The difference between the hash based on 6 maps and the hash based on a single map is 67.4ns/kprobe.
So my reading of this is that we are saving 67.4ns for every invocation of the hook.
But after you say that:
This is admittedly very small compared to the total cost of the kprobe (2.92us), and represents an increase of 2.3%.
So I'm not sure where the 2.3% comes from. Am I missing something?
I was: 67.4 / 2920 = 2.3%.
Could you add the measurements (maybe the absolute values?) for every run in the commit message so that it is available as part of git history?
Thanks!
Full string matches in argMatch selectors are limited to two strings per selector. This limitation is due to the complexity of looping over every byte of long strings. This commit reimplements this as a set of hash look ups to reduce complexity and permit an infinite number of full string matches. BPF hash maps use jhash to hash their keys. jhash operates on 12 byte blocks, followed by a final block. It appears that the most efficient use of jhash is when the input is 1 byte longer than a multiple of 12. Using a single hash map would require all inputs to be the same length, as the key is a fixed length. This is inefficient as short strings will involve a lot of hashing of padding bytes. This commit implements six string hash maps, for padded strings of sizes 25, 49, 73, 97, 121, and 145 bytes. In order to not confuse character buffers of different lengths where one buffer ends in padding and the other ends in matching 0s, the first byte of the padded buffer is the length. This permits this approach to be used for character buffers in addition to null-terminated strings. Note, as a consequence of implementing hash look ups for string matching, we no longer need to specify trailing null characters ("\0") in tracing policies. Any trailing null characters will be removed internally before being padded and inserted into the maps. I've taken some measurements using 50,000,000 calls to a kprobe: * The difference between no hash and the hash based on 6 maps is 60.2ns/kprobe. * The difference between no hash and the hash based on a single map is 127.6ns/kprobe. * The difference between the hash based on 6 maps and the hash based on a single map is 67.4ns/kprobe. Objectively, the hash for a single map is more than twice the cost of the hash for 6 maps. This is admittedly very small compared to the total cost of the kprobe (2.92us), and represents an increase of 2.3%. Signed-off-by: Kevin Sheldrake <kevin.sheldrake@isovalent.com>
Prefix string matches are limited to two strings per selector. This limitation is due to the complexity of looping over every byte in long strings. This commit replaces the prefix matching with LPM TRIE look ups. Prefix strings are stored in a LPM TRIE map with the prefixes set to the length of the strings (in bits). Look ups are performed by setting the look up prefix to the length of the string to look up (in bits) and calling the usual map look up helper. This permits any number of string prefixes for strings to be compared against. Signed-off-by: Kevin Sheldrake <kevin.sheldrake@isovalent.com>
ea8550c
to
4d15cd7
Compare
Postfix string matches are limited to two strings per selector. This limitation is due to the complexity of looping over every byte in long strings. This commit replaces the postfix matching with LPM TRIE look ups. Postfix strings are stored (in reverse) in a LPM TRIE map with the prefixes set to the length of the strings (in bits). Look ups are performed by setting the look up prefix to the length of the string to look up (in bits), and the data to the reverse of the string to look up, and calling the usual map look up helper. This permits any number of string postfixes for strings to be compared against. As hash look ups are now used for postfix matching, the null characters (\0) are no longer required on the end of match strings. Any null characters found on the ends of the match strings will be gracefully removed. Signed-off-by: Kevin Sheldrake <kevin.sheldrake@isovalent.com>
Just like in the previous commit regarding string matches, this commit converts file and FD equal matches to hash look ups. It also covers notEqual matches too. See previous commits for details. Signed-off-by: Kevin Sheldrake <kevin.sheldrake@isovalent.com>
As for the previous commits that converted string prefix and postfix matches to LPM TRIE look ups, this commit converts file and FD prefix and postfix matches to the same. Also includes 'not' matches. Note: TestMultipleMountPathFiltered on kernels <v5.2 has been modified as the policy specified a postfix path that was longer than could possibly be matched. The test now specifies a policy with a path that can be matched correctly. The test passed prior to this commit because the postfix comparison started at the end and stopped as soon as one of the strings was exhausted. This meant that the full specified postfix string wasn't checked. This was an incorrect result as the start of the path could have contained any directories and it would still have matched. The updated test now only contains a postfix path that can actually be checked. Signed-off-by: Kevin Sheldrake <kevin.sheldrake@isovalent.com>
Having implemented all the equal/prefix/postfix matches in hashes, and having implemented the not versions for file/fd, now implement the not versions for string and char buffers. Signed-off-by: Kevin Sheldrake <kevin.sheldrake@isovalent.com>
When the string and file matches relied on byte for byte matching in BPF, postfix matches were limited to 50 chars for >=5.4 and 40 chars <5.4. With the new hash match look ups, the >=5.4 limit was raised to the standard 128 characters by default. For <5.4, we can now raise the 'small program' limit to 96 without exceeding the instruction count. Signed-off-by: Kevin Sheldrake <kevin.sheldrake@isovalent.com>
4d15cd7
to
d24eaad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 🚀
Currently string, char buf, file, fd, and path matchers (equal, prefix, postfix, plus the 'not' equivalents for file, fd and path) are all written in BPF using loops to inspect (potentially) every character in the strings to be matched.
This series (see commits for details) converts all these matchers to hash map look ups, using a set of 6 hash maps of varying sizes for equal matches, and LPM TRIE maps for prefix and postfix look ups. In addition, the 'not' equivalents are implemented for string and char buf.
The default string lengths for the matches has been set at 128 bytes, but could be increased to an arbitrary size for equals, and up to 256 bytes for prefix and postfix. For small programs (<5.3), the postfix limit (which is bounded by a byte-by-byte reverse copy) has been raised from 40 chars to 96 chars. For large programs, the postfix limit is set at 128 chars (raised from 50 chars).