Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

SARIF output format #376

Merged
merged 58 commits into from
Nov 3, 2021
Merged

Conversation

AlexanderEichler
Copy link
Contributor

Integrates the Sarif Format as a possible output for Goblint.
It is called with the option --sarif.
An output file can be declared as normal with -o out.sarif (since Sarif is a Json file, the ending .json works too).
For testing the output there is a small script: scripts/createSarifTestOutput.sh.
One small error still exists in e.g. tests/regression/50-juliet/02, because a " is used in the warning text in Messages.ml Piece.t text.

The Github Action can be used in C repositories, by adding it to the .github/workflows folder and configuring the path to the main file.

Copy link
Member

@sim642 sim642 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually constructing JSON instead of using Yojson is super error-prone and clearly has already been an issue. Some " in a message was replaced with ' to avoid having to deal with JSON escaping issues.
The reasonable thing to do would be to let Yojson take care of JSON and just define the appropriate objects via OCaml types. Just like json-messages result, this won't be a bottleneck because these are just user-presentable messages. The only reason printXml for g2html uses ugly manual XML printing is because outputting all states is much more output and there it might be a bottleneck.

Comment on lines 108 to 133
let getBehaviorCategory (behavior:MessageCategory.behavior) = match behavior with
(* maybe CWE-589: Call to Non-ubiquitous API https://cwe.mitre.org/data/definitions/589.html*)
| Implementation-> "Implementation";
| Machine-> "Machine";
| Undefined u-> match u with
| NullPointerDereference -> "476";
| UseAfterFree -> "416"
| ArrayOutOfBounds arrayOutOfBounds -> match arrayOutOfBounds with
| PastEnd -> "788";
| BeforeStart -> "786:";
| Unknown -> "119"

let returnCategory (cat:MessageCategory.category)= match cat with
(* Assert is a category, that describer internal Goblint issues and has no real value in a Sarif output *)
| MessageCategory.Assert -> "Assert";
| MessageCategory.Deadcode -> "561";
| MessageCategory.Race -> "Race";
| MessageCategory.Unknown -> "Category Unknown";
(* Analyzer is a category, that describer internal Goblint issues and has no real value in a Sarif output *)
| MessageCategory.Analyzer -> "Analyzer";
| MessageCategory.Behavior b -> getBehaviorCategory b;
(* Cast of Type Missmatch is equal to the CWE 241. If MessageCategory.Cast is just an describing an internal Goblint issue, this CWE should be removed. *)
| MessageCategory.Cast c -> "241";
| MessageCategory.Integer i -> match i with
| Overflow -> "190";
| DivByZero -> "369"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these functions trying to achieve? If the message is related to a CWE, it should have a CWE tag directly. If it doesn't, then the origin of the message should be fixed instead of trying to associate message categories with CWEs after the fact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I thought I should attempt to match the Categories to a CWE, if possible. Some of them fit quite well. I can change it to just use the category instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the implementation to now use the categories. I still think, that providing some of the CWE information makes a lot of sense. For example the category > NullPointerDereference has a link to CWE-476. Do you think that is ok.
I'm struggling to find a good description for some Goblint categories, I would love to hear some suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the categories were arbitrarily added in #255 and aren't even used anywhere at this point, so trying to attach specific meaning to them is difficult.

What I mean is that categories are more general than specific CWEs and it's unnecessary to relate categories to CWEs globally because the places where this category is used, already attach a CWE tag as well:

(if AD.is_null adr
then M.error ~category:M.Category.Behavior.Undefined.nullpointer_dereference ~tags:[CWE 476] "Must dereference NULL pointer"
else if AD.may_be_null adr
then M.warn ~category:M.Category.Behavior.Undefined.nullpointer_dereference ~tags:[CWE 476] "May dereference NULL pointer");

I guess for the NullPointerDereference category one can attach a specific CWE link, but this isn't true for others, which may contain similar issues under various CWEs. And adding the same information under NullPointerDereference and CWE 476 just duplicates this information in the SARIF output because the same message has both, no?

Copy link
Contributor Author

@AlexanderEichler AlexanderEichler Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for the NullPointerDereference category one can attach a specific CWE link, but this isn't true for others, which may contain similar issues under various CWEs. And adding the same information under NullPointerDereference and CWE 476 just duplicates this information in the SARIF output because the same message has both, no?

Well both would be shown under the list of rules that Goblint can analyse. But in the actual results only the CWE is used, if it is present (@michael-schwarz made this suggestion).

@sim642 sim642 changed the title Integration sarif SARIF output format Oct 8, 2021
Copy link
Member

@sim642 sim642 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now did some fixing, cleanup and refactoring myself to get it suitable for merging.

@sim642 sim642 merged commit 950be9c into goblint:master Nov 3, 2021
@sim642
Copy link
Member

sim642 commented Nov 3, 2021

By the way, for anyone not on Slack, since I made a proper GitHub Action for Goblint, I tried it in action together with the SARIF support here: https://github.com/sim642/goblint-action-test. I think the Security tab is only visible to me there, so here's two screenshots:

Alerts list

image

First alert details

image

Since it doesn't require any location hacks anymore, it also shows the code snippet. Currently it's a bit silly because endLine = startLine + 4 is hardcoded since Goblint cannot detect it.
This also shows where which texts are used. Clearly the shortDescription for the race condition rule is way too long.

Anyway, I'm a bit skeptical about declaring rules for Goblint anyway because that requires maintaining a consistent and complete rule ID scheme, which isn't very doable without major effort (categorizing all the unknown warnings).

@sim642 sim642 mentioned this pull request Nov 3, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants