new(libscap): dump ringbuffer contents after detecting corruption #1997

gnosek · 2024-08-07T10:33:12Z

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Whenever we detect ring buffer corruption, our only diagnostic is "yeah, corrupted". Without a local repro it's basically impossible to fix these issues. This PR adds a hex dump of the whole ring buffer, annotated to simplify the analysis.

A snippet of sample output looks like:

RINGBUFFER DUMP[0x74e220190010] 00001980  00 00 00 00 00 00 00 c0 03 00 00 6b 93 42 2f 7b  97 e7 17 07 00 00 00 00 00 00 00 76 00 00 00 07  | ...........k.B/{ ...........v....
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~~~~~~~~~~~~~~~~~~~1~~~~~~~~~~~t~~~~~~~~~~~~~~~~~~~~~~~~T~~~~~~~~~~~~~~~~~~~~~~~l~~~~~~~~~~~^~~
RINGBUFFER DUMP[0x74e220190010] next evt                                  <t************************T***********************l***********^**
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 000019a0  00 02 00 00 00 08 00 50 00 c0 03 00 00 00 00 00  00 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00  | .......P........ ..ELF...........
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~n~~~~~~~~~~~~~~~~~~~~~~~0~~~~~~~~~~~~~~~~~~~~~~~~1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RINGBUFFER DUMP[0x74e220190010] next evt  ***n***********************0************************1********************************************
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 000019c0  00 03 00 3e 00 01 00 00 00 00 00 00 00 00 00 00  00 40 00 00 00 00 00 00 00 30 cc 27 00 00 00 00  | ...>............ .@.......0.'....
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RINGBUFFER DUMP[0x74e220190010] next evt  *************************************************************************************************
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 000019e0  00 00 00 00 00 40 00 38 00 0a 00 40 00 1a 00 19  00 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00  | .....@.8...@.... ................
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RINGBUFFER DUMP[0x74e220190010] next evt  *************************************************************************************************
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 00001a00  00 19 e3 42 2f 7b 97 e7 17 07 00 00 00 00 00 00  00 4e 00 00 00 a0 00 06 00 00 00 08 00 08 00 04  | ...B/{.......... .N..............
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~t~~~~~~~~~~~~~~~~~~~~~~~T~~~~~~~~~~~~~~~~~~~~~~~~l~~~~~~~~~~~^~~~~~n~~~~~~~~~~~~~~~~~~~~~~~~~~
RINGBUFFER DUMP[0x74e220190010] next evt  ***t***********************T************************l***********^*****n**************************
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 00001a20  00 04 00 08 00 08 00 00 00 00 00 00 00 00 00 00  20 28 00 00 00 00 00 01 00 00 00 02 00 00 00 03  | ................  (..............
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~~~~~~~~~~~~~~~~~~~0~~~~~~~~~~~~~~~~~~~~~~~1~~~~~~~~~~~~~~~~~~~~~~~~2~~~~~~~~~~~3~~~~~~~~~~~4~~
RINGBUFFER DUMP[0x74e220190010] next evt  *********************0***********************1************************2***********3***********4**
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------
RINGBUFFER DUMP[0x74e220190010] 00001a40  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 25  2d 43 2f 7b 97 e7 17 07 00 00 00 00 00 00 00 36  | ...............% -C/{...........6
RINGBUFFER DUMP[0x74e220190010] lastread  ~~~~~~~~~~~~~~~~~~~~~5~~~~~~~~~~~~~~~~~~~~~~~t~~~~~~~~~~~~~~~~~~~~~~~~T~~~~~~~~~~~~~~~~~~~~~~~l~~
RINGBUFFER DUMP[0x74e220190010] next evt  *********************5***********************t************************T***********************l**
RINGBUFFER DUMP[0x74e220190010] used      -------------------------------------------------------------------------------------------------

(this goes on for megabytes, though all-zero rows are skipped)

There are three spans:

lastread (~) shows the current event batch (what scap is currently locking in the ring buffer), should cover exactly a series of events
next evt (*) shows the unconsumed events from the current batch (starting with the next event we were about to return but detected corruption)
used (-) shows the portion of the ringbuffer filled with event data
The marks on the lastread and next evt lines are:

T: timestamp (first field of each event)
t: tid
l: event length
^: event type (I ran out of cases for t)
n: number of parameters
0-9: individual parameters (if there are over 10, we'll go higher in the ascii table: 0123456789:;<=>?@abcdef...)
the three lines are mostly redundant but since we're looking for corruption, let's not assume things make sense
the marks on lastread and next evt should always coincide, unless something is severely broken
The event-related markers are best effort (based on the current values of the ring buffer pointers), but if they do not mark realistic events, then that is the corruption we're looking for.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

new(libscap): upon detecting ring buffer corruption, an annotated dump of the whole ring buffer will be printed to stderr

poiana · 2024-08-07T10:33:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnosek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [gnosek]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>

github-actions · 2024-08-07T10:48:33Z

Perf diff from master - unit tests

     0.43%     +0.98%  [.] libsinsp::events::is_unknown_event
     3.58%     +0.79%  [.] gzfile_read
     5.91%     -0.56%  [.] next
     3.75%     -0.55%  [.] sinsp_thread_manager::find_thread
     6.94%     -0.54%  [.] sinsp::next
     2.60%     +0.51%  [.] sinsp_thread_manager::get_thread_ref
     2.06%     +0.50%  [.] scap_event_decode_params
     1.14%     -0.49%  [.] sinsp_parser::event_cleanup
     4.94%     +0.45%  [.] sinsp_parser::process_event
     1.36%     -0.42%  [.] 0x00000000000e8384

Perf diff from master - scap file

     8.16%     +7.73%  [.] sinsp_filter_check::tostring
    16.52%     -4.93%  [.] sinsp_filter_check_event::extract_single
    23.24%     -3.38%  [.] sinsp_evt_formatter::tostring_withformat
     3.82%     +2.88%  [.] sinsp_filter_check::rawval_to_string
    15.96%     -2.43%  [.] sinsp_filter_check_thread::extract_single
     7.40%     -0.96%  [.] gzfile_read
     3.33%     +0.82%  [.] 0x00000000000a70c0
     3.98%     +0.53%  [.] sinsp_filter_check::get_field_info
     4.00%     +0.53%  [.] sinsp_parser::reset
     4.00%     +0.50%  [.] formatted_dump

Heap diff from master - unit tests

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

codecov · 2024-08-07T10:55:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.08%. Comparing base (2c2e9b0) to head (55ddc3f).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1997   +/-   ##
=======================================
  Coverage   74.08%   74.08%           
=======================================
  Files         253      253           
  Lines       30766    30766           
  Branches     5408     5410    +2     
=======================================
+ Hits        22793    22794    +1     
+ Misses       7971     7944   -27     
- Partials        2       28   +26

Flag	Coverage Δ
libsinsp	`74.08% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

poiana · 2024-08-07T11:24:31Z

LGTM label has been added.

Git tree hash: 8a53ce7af9a2746b6868923184ea344a8387ca6b

LucaGuerra · 2024-08-07T11:25:25Z

/milestone 0.18.0

poiana added release-note kind/feature New feature or request dco-signoff: yes area/libscap labels Aug 7, 2024

poiana added the size/XL label Aug 7, 2024

poiana requested review from hbrueckner and jasondellaluce August 7, 2024 10:33

poiana added the approved label Aug 7, 2024

new(libscap): dump ringbuffer contents after detecting corruption

55ddc3f

Signed-off-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>

gnosek force-pushed the ringbuffer-dump branch from 9d61682 to 55ddc3f Compare August 7, 2024 10:44

LucaGuerra approved these changes Aug 7, 2024

View reviewed changes

poiana assigned LucaGuerra Aug 7, 2024

poiana added the lgtm label Aug 7, 2024

poiana added this to the 0.18.0 milestone Aug 7, 2024

jasondellaluce approved these changes Aug 7, 2024

View reviewed changes

poiana assigned jasondellaluce Aug 7, 2024

poiana merged commit 5fa87bb into falcosecurity:master Aug 7, 2024
43 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new(libscap): dump ringbuffer contents after detecting corruption #1997

new(libscap): dump ringbuffer contents after detecting corruption #1997

gnosek commented Aug 7, 2024

poiana commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

codecov bot commented Aug 7, 2024 •

edited

Loading

poiana commented Aug 7, 2024

LucaGuerra commented Aug 7, 2024

new(libscap): dump ringbuffer contents after detecting corruption #1997

new(libscap): dump ringbuffer contents after detecting corruption #1997

Conversation

gnosek commented Aug 7, 2024

poiana commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

Perf diff from master - unit tests

Perf diff from master - scap file

Heap diff from master - unit tests

Heap diff from master - scap file

codecov bot commented Aug 7, 2024 • edited Loading

Codecov Report

poiana commented Aug 7, 2024

LucaGuerra commented Aug 7, 2024

codecov bot commented Aug 7, 2024 •

edited

Loading