Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

new(libscap): dump ringbuffer contents after detecting corruption #1997

Merged
merged 1 commit into from
Aug 7, 2024

Conversation

gnosek
Copy link
Contributor

@gnosek gnosek commented Aug 7, 2024

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Whenever we detect ring buffer corruption, our only diagnostic is "yeah, corrupted". Without a local repro it's basically impossible to fix these issues. This PR adds a hex dump of the whole ring buffer, annotated to simplify the analysis.

A snippet of sample output looks like:

RINGBUFFER DUMP[0x74e220190010] 00001980  00 00 00 00 00 00 00 c0 03 00 00 6b 93 42 2f 7b  97 e7 17 07 00 00 00 00 00 00 00 76 00 00 00 07  | ...........k.B/{ ...........v....
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~~~~~~~~~~~~~~~~~~~1~~~~~~~~~~~t~~~~~~~~~~~~~~~~~~~~~~~~T~~~~~~~~~~~~~~~~~~~~~~~l~~~~~~~~~~~^~~
RINGBUFFER DUMP[0x74e220190010] next evt                                  <t************************T***********************l***********^**
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 000019a0  00 02 00 00 00 08 00 50 00 c0 03 00 00 00 00 00  00 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00  | .......P........ ..ELF...........
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~n~~~~~~~~~~~~~~~~~~~~~~~0~~~~~~~~~~~~~~~~~~~~~~~~1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RINGBUFFER DUMP[0x74e220190010] next evt  ***n***********************0************************1********************************************
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 000019c0  00 03 00 3e 00 01 00 00 00 00 00 00 00 00 00 00  00 40 00 00 00 00 00 00 00 30 cc 27 00 00 00 00  | ...>............ .@.......0.'....
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RINGBUFFER DUMP[0x74e220190010] next evt  *************************************************************************************************
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 000019e0  00 00 00 00 00 40 00 38 00 0a 00 40 00 1a 00 19  00 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00  | .....@.8...@.... ................
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RINGBUFFER DUMP[0x74e220190010] next evt  *************************************************************************************************
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 00001a00  00 19 e3 42 2f 7b 97 e7 17 07 00 00 00 00 00 00  00 4e 00 00 00 a0 00 06 00 00 00 08 00 08 00 04  | ...B/{.......... .N..............
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~t~~~~~~~~~~~~~~~~~~~~~~~T~~~~~~~~~~~~~~~~~~~~~~~~l~~~~~~~~~~~^~~~~~n~~~~~~~~~~~~~~~~~~~~~~~~~~
RINGBUFFER DUMP[0x74e220190010] next evt  ***t***********************T************************l***********^*****n**************************
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 00001a20  00 04 00 08 00 08 00 00 00 00 00 00 00 00 00 00  20 28 00 00 00 00 00 01 00 00 00 02 00 00 00 03  | ................  (..............
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~~~~~~~~~~~~~~~~~~~0~~~~~~~~~~~~~~~~~~~~~~~1~~~~~~~~~~~~~~~~~~~~~~~~2~~~~~~~~~~~3~~~~~~~~~~~4~~
RINGBUFFER DUMP[0x74e220190010] next evt  *********************0***********************1************************2***********3***********4**
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 00001a40  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 25  2d 43 2f 7b 97 e7 17 07 00 00 00 00 00 00 00 36  | ...............% -C/{...........6
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~~~~~~~~~~~~~~~~~~~5~~~~~~~~~~~~~~~~~~~~~~~t~~~~~~~~~~~~~~~~~~~~~~~~T~~~~~~~~~~~~~~~~~~~~~~~l~~
RINGBUFFER DUMP[0x74e220190010] next evt  *********************5***********************t************************T***********************l**
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------

(this goes on for megabytes, though all-zero rows are skipped)

There are three spans:

lastread (~) shows the current event batch (what scap is currently locking in the ring buffer), should cover exactly a series of events
next evt (*) shows the unconsumed events from the current batch (starting with the next event we were about to return but detected corruption)
used (-) shows the portion of the ringbuffer filled with event data
The marks on the lastread and next evt lines are:

T: timestamp (first field of each event)
t: tid
l: event length
^: event type (I ran out of cases for t)
n: number of parameters
0-9: individual parameters (if there are over 10, we'll go higher in the ascii table: 0123456789:;<=>?@abcdef...)
the three lines are mostly redundant but since we're looking for corruption, let's not assume things make sense
the marks on lastread and next evt should always coincide, unless something is severely broken
The event-related markers are best effort (based on the current values of the ring buffer pointers), but if they do not mark realistic events, then that is the corruption we're looking for.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

new(libscap): upon detecting ring buffer corruption, an annotated dump of the whole ring buffer will be printed to stderr

@poiana
Copy link
Contributor

poiana commented Aug 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnosek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>
Copy link

github-actions bot commented Aug 7, 2024

Perf diff from master - unit tests

     0.43%     +0.98%  [.] libsinsp::events::is_unknown_event
     3.58%     +0.79%  [.] gzfile_read
     5.91%     -0.56%  [.] next
     3.75%     -0.55%  [.] sinsp_thread_manager::find_thread
     6.94%     -0.54%  [.] sinsp::next
     2.60%     +0.51%  [.] sinsp_thread_manager::get_thread_ref
     2.06%     +0.50%  [.] scap_event_decode_params
     1.14%     -0.49%  [.] sinsp_parser::event_cleanup
     4.94%     +0.45%  [.] sinsp_parser::process_event
     1.36%     -0.42%  [.] 0x00000000000e8384

Perf diff from master - scap file

     8.16%     +7.73%  [.] sinsp_filter_check::tostring
    16.52%     -4.93%  [.] sinsp_filter_check_event::extract_single
    23.24%     -3.38%  [.] sinsp_evt_formatter::tostring_withformat
     3.82%     +2.88%  [.] sinsp_filter_check::rawval_to_string
    15.96%     -2.43%  [.] sinsp_filter_check_thread::extract_single
     7.40%     -0.96%  [.] gzfile_read
     3.33%     +0.82%  [.] 0x00000000000a70c0
     3.98%     +0.53%  [.] sinsp_filter_check::get_field_info
     4.00%     +0.53%  [.] sinsp_parser::reset
     4.00%     +0.50%  [.] formatted_dump

Heap diff from master - unit tests

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Copy link

codecov bot commented Aug 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.08%. Comparing base (2c2e9b0) to head (55ddc3f).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1997   +/-   ##
=======================================
  Coverage   74.08%   74.08%           
=======================================
  Files         253      253           
  Lines       30766    30766           
  Branches     5408     5410    +2     
=======================================
+ Hits        22793    22794    +1     
+ Misses       7971     7944   -27     
- Partials        2       28   +26     
Flag Coverage Δ
libsinsp 74.08% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@poiana
Copy link
Contributor

poiana commented Aug 7, 2024

LGTM label has been added.

Git tree hash: 8a53ce7af9a2746b6868923184ea344a8387ca6b

@LucaGuerra
Copy link
Contributor

/milestone 0.18.0

@poiana poiana added this to the 0.18.0 milestone Aug 7, 2024
@poiana poiana merged commit 5fa87bb into falcosecurity:master Aug 7, 2024
43 of 47 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants