-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Remove unnecessary copying in transformations #2368
Remove unnecessary copying in transformations #2368
Conversation
In C++11, string data is always null-terminated[1], and can be freely modified[2]. [1] https://stackoverflow.com/questions/6077189/will-stdstring-always-be-null-terminated-in-c11 [2] https://stackoverflow.com/questions/38875623/is-it-permitted-to-modify-the-internal-stdstring-buffer-returned-by-operator
Thanks @WGH-. Interesting proposal. Indeed, as mentioned in [2] and as we are using c++11, it is possible to access the string buffer by array idx, and the buffer is stored contiguously. As of our development branch, we have added the concept of ModSecString. It is not yet 100% ready, but the idea is to have an allocator to use a stack-based memory pool for each request, avoiding memory fragmentation. That stack allocator should be aware that space should be allocated contiguously. I am going to review it better and get back to you soon. Example of ModSecString - |
Basically, you want to bring back region-based memory management, which old ModSecurity used with help of |
Not really, the apr_pool was used everywhere. For some pieces of the code where the strings are pre-computed as the request are processed (ARGS and QUERY_STRING), it makes sense to avoid memory fragmentation. The conjunction of std::string_view and stacked (not dynamic) memory usage could be very useful performance-wise. That is different from having dynamically allocated memory pools. It is not about having a generic memory pool but use the resources more wisely. |
You're suggesting to use the call stack for variables associated with particular transaction? |
This optimization has very small performance improvement: increase from 117-118 requests per second to 121-122 RPS. It might start to matter more once other parts are optimized, but right now it's mostly code cleanup. |
Let me illustrate that with an example - Lets assume that we have a request at the URI: By the time that we achieve a given phase (request life cycle) the variables will be populated internally in modsec. To exemplifies, let's pick: ARGS, ARGS_GET, QUERY_STRING, REQUEST_URI, REQUEST_URI_RAW they will be populated as such: ARGSARGS:foo :: bar ARGS_GETARGS:foo :: bar QUERY_STRINGQUERY_STRING :: foo=bar&github=fine&a=b REQUEST_URIREQUEST_URI :: /modsec-is-cool/?foo=bar&github=fine&a=b REQUEST_URI_RAWREQUEST_URI_RAW :: /modsec-is-cool/?foo=bar&github=fine&a=b Notice that the content of variables (and keys), repeats. Using the example of the word fine, it is allocated for each of those variables. In this example, fine could be allocated for QUERY_STRING, and mapped using (string_view) for the other variables. Reducing memory usage and avoid extra malloc calls, consequently reducing memory fragmentation. In that described use case, the string_view for each variable will be a part of the Transaction class, that could be stacked allocated as - although dynamically - the parameters do not vary on size: offset (void *) and length (uint?). The above is just an example to illustrate a rationally; there are some corner cases. There is usage for a memory pool, especially in terms of disposable memory used in serialized operations such as transformations. First, let's see if it is possible to avoid unnecessary allocations, as you are suggesting in this very own patch. |
At a certain point, the improvements sound to be numerically inefficient. However, placing it in scale, they are not bad. The benefit here is in the order of magnitude of 3% ~ 4%. Not bad. In a busy server, I am sure that it does make a difference. More to come, as the code became cleaner, it easy to spot an improve what needs to be improved :) |
Not worked with that yet as I am fixing the xmlns action on the v3.1 branch. The faulty test case is very annoying. |
xmlns is now fixed. Going to fix the cppcheck warnings on v3/master. Afterward, I will proceed with the rebase for the v3.1-experimental branch. And finally, merge this. The issue regarding the xmlns was a missing attribution for the rule variable. Almost all variables work without the knowledge of the rule that it belongs to, except for two variables: RULE, and XML. During the Rule instance copy, the variable was being copied correctly. But, the reference to the Rule (inside the variable instance) was remained unchanged, point towards the old instance. In the case of xmlns the old Rule was deleted and the reference was being pointed to an invalid location hence the crash. Relevant pieces of codeBase class common to XML and RULEPopulating all Variables with base type RuleVariable during a Rule instance copyXML Variable |
I think most old-school pointers can be safely replaced with |
In this particular case, we are using a shared_ptr. The idea is that Rules could belong to different hosts and so its variables. The variables that do not demand specific processing on execution time, are in practice anchors to offsets on the Transaction object. Safe to share among different Rules. |
@@ -61,7 +50,7 @@ std::string Utf8ToUnicode::evaluate(const std::string &value, | |||
} | |||
|
|||
|
|||
char *Utf8ToUnicode::inplace(unsigned char *input, | |||
char *Utf8ToUnicode::inplace(const unsigned char *input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the in-place method here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, apart from the name the function is not really in place.
I have rebased and applied this PR at the top of v3.1-experimental. Commit: 0cc0b929c36374084e2e607673ab88e29c848fb0 Some changes were placed as a consequence of the rebase. |
It is now - 5226c42 |
Merged! Thanks! |
In C++11, string data is always null-terminated[1], and can be freely modified[2].
[1] https://stackoverflow.com/questions/6077189/will-stdstring-always-be-null-terminated-in-c11
[2] https://stackoverflow.com/questions/38875623/is-it-permitted-to-modify-the-internal-stdstring-buffer-returned-by-operator