Provide consistent concept for DFG/EOG for Overlay nodes #1994

oxisto · 2025-01-28T09:38:17Z

We need a consistent concept for

DFG and EOG edges for the operations and concepts
Is it allowed to put DFG edges between concepts/operations?
Put the DFG edges on the "traditional world" based on the concepts
Guidance where to place overlay nodes -> Is it allowed to have multiple "underlay" nodes (currently NOT)
Does that effect the queries?
- For example, it can be "dangerous", to make DFG shortcuts, because then we do not "see" the traditional DFG path anymore and cannot check if something happens on this path
Does it make sense to introduce a new ConceptGranularity to have a "concept-world" DFG edge

The text was updated successfully, but these errors were encountered:

maximiliankaul · 2025-02-04T16:44:14Z

Summary of today's discussion:

DFG and EOG edges

We will mark new DFG/EOG edges.
- This allows us to ignore these new edges in existing passes, thus not breaking existing behavior.
We will connect Operation nodes to the CPG nodes to allow for nicer queries (all DFG paths from FileRead must... instead of all nodes, where an OverlayNode of type FileRead exists, must ...

graph TD
  FileRead["Operation[FileRead]"]
  CallExpr["CallExpression[read]"]

  FileRead -->|DFG| CallExpr

New edges will only be introduced wherever they provide a benefit.
- This allows us to connect nodes where the original CPG would not find a connection (e.g. HTTP endpoint and client call).
- This does not introduce shorter paths as no new edge is introduced if a DFG/... path is already present in the "CPG world"
We'll have to be careful as this can result in new loops.

Deciding if all paths contain a required `Concept` node

This is not easy, as there will always be the path using the concept nodes and the one ignoring it:

graph TD
    prev["before foo()"]
    foo["foo()"]
    after["after foo()"]
    concept["concept"]
    prev -->|DFG| foo -->|DFG| after
    concept -->|DFG| foo
    foo -->|DFG| concept

Currently, the best solution is to rephrase such a query to check for a node with an overlayNode of Type X on the path. However, @konradweiss has another suggestion to add to this discussion.

Where to place nodes?

This is purely a design decision. Suggestion:

One node per "concept instance" (instead of one global node per concept)
The concept node is connected to the CPG node creating the object (e.g. API call / malloc / ...).
Operation nodes are connected to the respective CallExpressions.

Concept Reference

We have concepts with fields representing other concepts. E.g. a DiskEncryption concept requires a Secret as a key.

We'll introduce a ConceptReference or something similar.
Having a reference keeps us from dealing with code where we cannot decide which concrete instance of a concept is used (e.g. a Secret set in a branch being used outside the branch). However, this can still be resolved manually to the best of the CPG's ability).

Multiple "underlay" nodes

Think of the following code snippet:

key = SomeLib.getKey()
foo(key)
bar(key)

Here, we will not create edges from the call arguments to the Secret concept. We will provide an (extension?) function allowing the end user to easily check if a given Reference (i.e. key) node can be a key by checking prevDFG. This is purely a convenience function and should result in easier to read queries.
There is also the option to connect all occurrences of key with the concept, however we do not see a benefit of this approach. This is purely a design decision and should yield equivalent results.

Do we have to require a `Operation` class

The question on the necessity of having an Operation for creating a Concept was raised and has to be evaluated.

konradweiss · 2025-02-05T08:59:55Z

Predicate Evaluation on Overlay Nodes

My Idea was to look at all our functions that search for a node along a path, given a predicate p, e.g. followNextFullDFGUntilHit(), and if the predicate does not evaluate to true, its is then also evaluated on the nodes connected overlay nodes, if it evaluates to true, the nodes and the overlay node are added to the current fulfilledPaths. If this evaluation can then also be implemented to allow starting at the overlay nodes and progress over the overlay nodes underlying node I can imagine that we don't need DFG or EOG edges outside of the cases were we explicitly add new information on data flows, e.g. reading from a file, connecting data over REST interfaces.

oxisto · 2025-02-11T14:12:46Z

@konradweiss will implement the DFG granularities

oxisto · 2025-02-25T05:37:00Z

We decided to have a property in the edge for this and this is currently used to differentiate the edges. Closing this for now.

oxisto assigned maximiliankaul and konradweiss Jan 28, 2025

oxisto added the DFG label Jan 28, 2025

oxisto closed this as completed Feb 25, 2025

oxisto mentioned this issue Feb 25, 2025

Provide a "summary" DFG granularity #2071

Open

This was referenced Feb 26, 2025

Follow through overlay #2074

Draft

No meaningful DFG/EOG queries possible which consider the overlay graph #2078

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide consistent concept for DFG/EOG for Overlay nodes #1994

Provide consistent concept for DFG/EOG for Overlay nodes #1994

oxisto commented Jan 28, 2025 •

edited

Loading

maximiliankaul commented Feb 4, 2025

konradweiss commented Feb 5, 2025

oxisto commented Feb 11, 2025

oxisto commented Feb 25, 2025

Provide consistent concept for DFG/EOG for Overlay nodes #1994

Provide consistent concept for DFG/EOG for Overlay nodes #1994

Comments

oxisto commented Jan 28, 2025 • edited Loading

maximiliankaul commented Feb 4, 2025

DFG and EOG edges

Deciding if all paths contain a required Concept node

Where to place nodes?

Concept Reference

Multiple "underlay" nodes

Do we have to require a Operation class

konradweiss commented Feb 5, 2025

Predicate Evaluation on Overlay Nodes

oxisto commented Feb 11, 2025

oxisto commented Feb 25, 2025

oxisto commented Jan 28, 2025 •

edited

Loading

Deciding if all paths contain a required `Concept` node

Do we have to require a `Operation` class