Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Provide consistent concept for DFG/EOG for Overlay nodes #1994

Closed
oxisto opened this issue Jan 28, 2025 · 4 comments
Closed

Provide consistent concept for DFG/EOG for Overlay nodes #1994

oxisto opened this issue Jan 28, 2025 · 4 comments
Assignees
Labels

Comments

@oxisto
Copy link
Member

oxisto commented Jan 28, 2025

We need a consistent concept for

  • DFG and EOG edges for the operations and concepts
  • Is it allowed to put DFG edges between concepts/operations?
  • Put the DFG edges on the "traditional world" based on the concepts
  • Guidance where to place overlay nodes -> Is it allowed to have multiple "underlay" nodes (currently NOT)
  • Does that effect the queries?
    • For example, it can be "dangerous", to make DFG shortcuts, because then we do not "see" the traditional DFG path anymore and cannot check if something happens on this path
  • Does it make sense to introduce a new ConceptGranularity to have a "concept-world" DFG edge
@maximiliankaul
Copy link
Contributor

Summary of today's discussion:

DFG and EOG edges

  • We will mark new DFG/EOG edges.
    • This allows us to ignore these new edges in existing passes, thus not breaking existing behavior.
  • We will connect Operation nodes to the CPG nodes to allow for nicer queries (all DFG paths from FileRead must... instead of all nodes, where an OverlayNode of type FileRead exists, must ...
graph TD
  FileRead["Operation[FileRead]"]
  CallExpr["CallExpression[read]"]

  FileRead -->|DFG| CallExpr
Loading
  • New edges will only be introduced wherever they provide a benefit.
    • This allows us to connect nodes where the original CPG would not find a connection (e.g. HTTP endpoint and client call).
    • This does not introduce shorter paths as no new edge is introduced if a DFG/... path is already present in the "CPG world"
  • We'll have to be careful as this can result in new loops.

Deciding if all paths contain a required Concept node

This is not easy, as there will always be the path using the concept nodes and the one ignoring it:

graph TD
    prev["before foo()"]
    foo["foo()"]
    after["after foo()"]
    concept["concept"]
    prev -->|DFG| foo -->|DFG| after
    concept -->|DFG| foo
    foo -->|DFG| concept
Loading

Currently, the best solution is to rephrase such a query to check for a node with an overlayNode of Type X on the path. However, @konradweiss has another suggestion to add to this discussion.

Where to place nodes?

This is purely a design decision. Suggestion:

  • One node per "concept instance" (instead of one global node per concept)
  • The concept node is connected to the CPG node creating the object (e.g. API call / malloc / ...).
  • Operation nodes are connected to the respective CallExpressions.

Concept Reference

We have concepts with fields representing other concepts. E.g. a DiskEncryption concept requires a Secret as a key.

  • We'll introduce a ConceptReference or something similar.
  • Having a reference keeps us from dealing with code where we cannot decide which concrete instance of a concept is used (e.g. a Secret set in a branch being used outside the branch). However, this can still be resolved manually to the best of the CPG's ability).

Multiple "underlay" nodes

Think of the following code snippet:

key = SomeLib.getKey()
foo(key)
bar(key)

Here, we will not create edges from the call arguments to the Secret concept. We will provide an (extension?) function allowing the end user to easily check if a given Reference (i.e. key) node can be a key by checking prevDFG. This is purely a convenience function and should result in easier to read queries.
There is also the option to connect all occurrences of key with the concept, however we do not see a benefit of this approach. This is purely a design decision and should yield equivalent results.

Do we have to require a Operation class

The question on the necessity of having an Operation for creating a Concept was raised and has to be evaluated.

@konradweiss
Copy link
Collaborator

Predicate Evaluation on Overlay Nodes

My Idea was to look at all our functions that search for a node along a path, given a predicate p, e.g. followNextFullDFGUntilHit(), and if the predicate does not evaluate to true, its is then also evaluated on the nodes connected overlay nodes, if it evaluates to true, the nodes and the overlay node are added to the current fulfilledPaths. If this evaluation can then also be implemented to allow starting at the overlay nodes and progress over the overlay nodes underlying node I can imagine that we don't need DFG or EOG edges outside of the cases were we explicitly add new information on data flows, e.g. reading from a file, connecting data over REST interfaces.

@oxisto
Copy link
Member Author

oxisto commented Feb 11, 2025

@konradweiss will implement the DFG granularities

@oxisto
Copy link
Member Author

oxisto commented Feb 25, 2025

We decided to have a property in the edge for this and this is currently used to differentiate the edges. Closing this for now.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants