-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
new(libscap): dump ringbuffer contents after detecting corruption #1997
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gnosek The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>
Perf diff from master - unit tests
Perf diff from master - scap file
Heap diff from master - unit tests
Heap diff from master - scap file
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1997 +/- ##
=======================================
Coverage 74.08% 74.08%
=======================================
Files 253 253
Lines 30766 30766
Branches 5408 5410 +2
=======================================
+ Hits 22793 22794 +1
+ Misses 7971 7944 -27
- Partials 2 28 +26
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
LGTM label has been added. Git tree hash: 8a53ce7af9a2746b6868923184ea344a8387ca6b
|
/milestone 0.18.0 |
What type of PR is this?
/kind feature
Any specific area of the project related to this PR?
/area libscap
Does this PR require a change in the driver versions?
What this PR does / why we need it:
Whenever we detect ring buffer corruption, our only diagnostic is "yeah, corrupted". Without a local repro it's basically impossible to fix these issues. This PR adds a hex dump of the whole ring buffer, annotated to simplify the analysis.
A snippet of sample output looks like:
(this goes on for megabytes, though all-zero rows are skipped)
There are three spans:
lastread (~) shows the current event batch (what scap is currently locking in the ring buffer), should cover exactly a series of events
next evt (*) shows the unconsumed events from the current batch (starting with the next event we were about to return but detected corruption)
used (-) shows the portion of the ringbuffer filled with event data
The marks on the lastread and next evt lines are:
T: timestamp (first field of each event)
t: tid
l: event length
^: event type (I ran out of cases for t)
n: number of parameters
0-9: individual parameters (if there are over 10, we'll go higher in the ascii table: 0123456789:;<=>?@abcdef...)
the three lines are mostly redundant but since we're looking for corruption, let's not assume things make sense
the marks on lastread and next evt should always coincide, unless something is severely broken
The event-related markers are best effort (based on the current values of the ring buffer pointers), but if they do not mark realistic events, then that is the corruption we're looking for.
Special notes for your reviewer:
Does this PR introduce a user-facing change?: