Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

New sketch of API for dataflow queries #2025

Merged
merged 68 commits into from
Feb 18, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
54a5f15
New sketch of API for dataflow queries
KuechA Feb 4, 2025
e300611
Some more initial stuff
KuechA Feb 5, 2025
f792d29
use the dataflow function
KuechA Feb 7, 2025
58c402c
Also provide EOG following base
KuechA Feb 7, 2025
d48c1fc
Merge branch 'main' into ak/dataflow-extension
KuechA Feb 7, 2025
4f7a3b0
Merge branch 'main' into ak/dataflow-extension
KuechA Feb 11, 2025
3290c33
Simplify code
KuechA Feb 12, 2025
fca961d
Try new version of allNonLiteralsFromFlowTo function
KuechA Feb 12, 2025
bb539ce
replace usage of all literals... function
KuechA Feb 13, 2025
fdc4ae3
Fix bugs
KuechA Feb 13, 2025
4266bd9
Merge branch 'main' into ak/dataflow-extension
KuechA Feb 13, 2025
c0a5590
Updated interface
KuechA Feb 13, 2025
d01ffb1
Move dataflow and eog related queries to own file
KuechA Feb 14, 2025
d68081a
Source validator sink query
KuechA Feb 14, 2025
dd96d89
Add implicit dataflow flag
KuechA Feb 14, 2025
240da6d
Merge branch 'main' into ak/dataflow-extension
KuechA Feb 14, 2025
650d55f
Add early termination options to followX extensions
KuechA Feb 14, 2025
9272a55
More flow queries: improved validator and allFlowTo implementation
KuechA Feb 14, 2025
524bced
Correct condition
KuechA Feb 14, 2025
869abe7
Start testing
KuechA Feb 14, 2025
64bfb2d
Test for intraprocedural backwards analyses
KuechA Feb 14, 2025
b8d1462
Extend tests for bidirectional test and following b
KuechA Feb 14, 2025
b6e3bf8
Correct test cases now
KuechA Feb 14, 2025
7a8e515
Add interprocedural and intraprocedural analysis flags for followPrev…
KuechA Feb 14, 2025
11d849c
Improve max steps functionality and call stack depth
KuechA Feb 14, 2025
887c116
Call depth
KuechA Feb 14, 2025
b100488
Fix c&p error
KuechA Feb 14, 2025
29653ba
Fix max steps logic
KuechA Feb 14, 2025
3374cf3
Fix max steps for real and add tests
KuechA Feb 14, 2025
0d36866
Reduce code, add tests
KuechA Feb 15, 2025
5b91753
Move to own file
KuechA Feb 15, 2025
b29afc8
Add interprocedural tests for validator
KuechA Feb 15, 2025
421f838
Tests work for interprocedural analysis
KuechA Feb 15, 2025
3e799c9
Clarify text
KuechA Feb 15, 2025
a57c1df
Documentation
KuechA Feb 15, 2025
0d46c26
Rename file
KuechA Feb 16, 2025
e423ffc
Add tests for execution path flows
KuechA Feb 16, 2025
ec0d3a7
Rename
KuechA Feb 16, 2025
63232bd
fix and todo
KuechA Feb 16, 2025
0e045db
Make AnalysisDirection a sealed class
KuechA Feb 16, 2025
a446268
Remove unused functions and add default arguments
KuechA Feb 16, 2025
bdc3c15
Remove default executionOrder function
KuechA Feb 16, 2025
3064d3c
No caps
KuechA Feb 16, 2025
9f850d2
Remove default dfg query functions
KuechA Feb 16, 2025
7a07789
Documentation for dataflow
KuechA Feb 16, 2025
ac9b5ee
More documentation
KuechA Feb 16, 2025
2005bb1
Sketch for extracting logic to the configuration
KuechA Feb 17, 2025
89d1faa
Move code to show idea
KuechA Feb 17, 2025
668dcc1
New dataflow analysis
KuechA Feb 17, 2025
27d68ce
Move code to CPG core and replace existing functions
KuechA Feb 17, 2025
9a3d958
Do not fail for empty stack
KuechA Feb 17, 2025
af0dae2
Also remove old followNextDFGEdgesUntilHit function
KuechA Feb 17, 2025
3c6743c
Move code
KuechA Feb 17, 2025
4425c98
Fix
KuechA Feb 17, 2025
d8bb820
Less duplicate code, less null checks, fix
KuechA Feb 17, 2025
156140b
Fix
KuechA Feb 17, 2025
babcb23
Also transfer most of EOG iteration
KuechA Feb 17, 2025
5758f79
Fix things
KuechA Feb 18, 2025
e193c57
Merge branch 'main' into ak/dataflow-extension
KuechA Feb 18, 2025
86cc969
Linebreak
KuechA Feb 18, 2025
784266c
More classes, more documentation
KuechA Feb 18, 2025
5a69399
Replace old functions
KuechA Feb 18, 2025
dbd3110
fix
KuechA Feb 18, 2025
2b1dfeb
Merge branch 'main' into ak/dataflow-extension
KuechA Feb 18, 2025
81a5638
objects instead of classes
KuechA Feb 18, 2025
78166a1
More objects
oxisto Feb 18, 2025
4fd32c8
Fixed
oxisto Feb 18, 2025
9199d5d
Some more documentation
oxisto Feb 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,367 @@
/*
* Copyright (c) 2025, Fraunhofer AISEC. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* $$$$$$\ $$$$$$$\ $$$$$$\
* $$ __$$\ $$ __$$\ $$ __$$\
* $$ / \__|$$ | $$ |$$ / \__|
* $$ | $$$$$$$ |$$ |$$$$\
* $$ | $$ ____/ $$ |\_$$ |
* $$ | $$\ $$ | $$ | $$ |
* \$$$$$ |$$ | \$$$$$ |
* \______/ \__| \______/
*
*/
package de.fraunhofer.aisec.cpg.query

import de.fraunhofer.aisec.cpg.graph.*
import de.fraunhofer.aisec.cpg.graph.AccessValues
import de.fraunhofer.aisec.cpg.graph.AnalysisSensitivity
import de.fraunhofer.aisec.cpg.graph.FilterUnreachableEOG
import de.fraunhofer.aisec.cpg.graph.Node
import de.fraunhofer.aisec.cpg.graph.edges.flows.Dataflow
import de.fraunhofer.aisec.cpg.graph.statements.expressions.Literal
import de.fraunhofer.aisec.cpg.graph.statements.expressions.Reference
import kotlin.collections.all

/**
* Converts the [FulfilledAndFailedPaths] to a list of [QueryTree]s containing the failed and
* fulfilled paths.
*/
fun FulfilledAndFailedPaths.toQueryTree(
startNode: Node,
queryType: String,
): List<QueryTree<Boolean>> {
return this.fulfilled.map {
QueryTree(
true,
mutableListOf(QueryTree(it)),
"$queryType from $startNode to ${it.last()} fulfills the requirement",
startNode,
)
} +
this.failed.map {
QueryTree(
false,
mutableListOf(QueryTree(it)),
"$queryType from $startNode to ${it.last()} does not fulfill the requirement",
startNode,
)
}
}

/** Determines if the predicate [Must] or [May] hold */
sealed class AnalysisType {
abstract fun createQueryTree(
evalRes: FulfilledAndFailedPaths,
startNode: Node,
queryType: String,
): QueryTree<Boolean>
}

/**
* The predicate must hold, i.e., all paths fulfill the property/requirement. No path violates the
* property/requirement.
*/
object Must : AnalysisType() {
override fun createQueryTree(
evalRes: FulfilledAndFailedPaths,
startNode: Node,
queryType: String,
): QueryTree<Boolean> {
val allPaths = evalRes.toQueryTree(startNode, queryType)

return QueryTree(
evalRes.failed.isEmpty(),
allPaths.toMutableList(),
"$queryType from $startNode to ${evalRes.fulfilled.map { it.last() }}",
startNode,
)
}
}

/**
* The predicate may hold, i.e., there is at least one path which fulfills the property/requirement.
*/
object May : AnalysisType() {
override fun createQueryTree(
evalRes: FulfilledAndFailedPaths,
startNode: Node,
queryType: String,
): QueryTree<Boolean> {
val allPaths = evalRes.toQueryTree(startNode, queryType)

return QueryTree(
evalRes.fulfilled.isNotEmpty(),
allPaths.toMutableList(),
"$queryType from $startNode to ${evalRes.fulfilled.map { it.last() }}",
startNode,
)
}
}

/**
* Follows the [Dataflow] edges from [startNode] in the given [direction] (default: Forward
* analysis) until reaching a node fulfilling [predicate].
*
* The interpretation of the analysis result can be configured as must analysis (all paths have to
* fulfill the criterion) or may analysis (at least one path has to fulfill the criterion) by
* setting the [type] parameter (default: [May]). Note that this only reasons about existing DFG
* paths, and it might not be sufficient if you actually want a guarantee that some action always
* happens with the data. In this case, you may need to check the [executionPath].
*
* The [sensitivities] can also be configured to a certain extent. The analysis can be run as
* inter-procedural or intra-procedural analysis. If [earlyTermination] is not `null`, this can be
* used as a criterion to make the query fail if this predicate is fulfilled before [predicate].
*/
fun dataFlow(
startNode: Node,
direction: AnalysisDirection = Forward(GraphToFollow.DFG),
type: AnalysisType = May,
vararg sensitivities: AnalysisSensitivity = FieldSensitive + ContextSensitive,
scope: AnalysisScope = Interprocedural(),
verbose: Boolean = true,
earlyTermination: ((Node) -> Boolean)? = null,
predicate: (Node) -> Boolean,
): QueryTree<Boolean> {
val collectFailedPaths = type == Must || verbose
val findAllPossiblePaths = type == Must || verbose
val earlyTermination = { n: Node, ctx: Context -> earlyTermination?.let { it(n) } == true }

val evalRes =
if (direction is Bidirectional) {
arrayOf(Forward(GraphToFollow.DFG), Backward(GraphToFollow.DFG))
} else {
arrayOf(direction)
}
.fold(FulfilledAndFailedPaths(listOf(), listOf())) { result, direction ->
result +
startNode.followDFGEdgesUntilHit(
collectFailedPaths = collectFailedPaths,
findAllPossiblePaths = findAllPossiblePaths,
direction = direction,
sensitivities = sensitivities,
scope = scope,
earlyTermination = earlyTermination,
predicate = predicate,
)
}

return type.createQueryTree(evalRes = evalRes, startNode = startNode, queryType = "data flow")
}

/**
* Follows the [de.fraunhofer.aisec.cpg.graph.edges.flows.EvaluationOrder] edges from [startNode] in
* the given [direction] (default: [Forward] analysis) until reaching a node fulfilling [predicate].
*
* The interpretation of the analysis result can be configured as must analysis (all paths have to
* fulfill the criterion) or may analysis (at least one path has to fulfill the criterion) by
* setting the [type] parameter (default: [May]).
*
* The analysis can be run as interprocedural or intraprocedural analysis by setting [scope]. If
* [earlyTermination] is not `null`, this can be used as a criterion to make the query fail if this
* predicate is fulfilled before [predicate].
*/
fun executionPath(
startNode: Node,
direction: AnalysisDirection = Forward(GraphToFollow.EOG),
type: AnalysisType = May,
scope: AnalysisScope = Interprocedural(),
verbose: Boolean = true,
earlyTermination: ((Node) -> Boolean)? = null,
predicate: (Node) -> Boolean,
): QueryTree<Boolean> {
val collectFailedPaths = type == Must || verbose
val findAllPossiblePaths = type == Must || verbose
val earlyTermination = { n: Node, ctx: Context -> earlyTermination?.let { it(n) } == true }

val evalRes =
if (direction is Bidirectional) {
arrayOf(Forward(GraphToFollow.EOG), Backward(GraphToFollow.EOG))
} else {
arrayOf(direction)
}
.fold(FulfilledAndFailedPaths(listOf(), listOf())) { result, direction ->
result +
startNode.followEOGEdgesUntilHit(
collectFailedPaths = collectFailedPaths,
findAllPossiblePaths = findAllPossiblePaths,
direction = direction,
sensitivities = FilterUnreachableEOG + ContextSensitive,
scope = scope,
earlyTermination = earlyTermination,
predicate = predicate,
)
}

return type.createQueryTree(
evalRes = evalRes,
startNode = startNode,
queryType = "execution path",
)
}

/**
* This function tracks if the data in [source] always flow through a node which fulfills
* [validatorPredicate] before reaching a sink which is specified by [sinkPredicate]. The analysis
* can be configured with [scope] and [sensitivities].
*/
fun dataFlowWithValidator(
source: Node,
validatorPredicate: (Node) -> Boolean,
sinkPredicate: (Node) -> Boolean,
scope: AnalysisScope,
vararg sensitivities: AnalysisSensitivity,
): QueryTree<Boolean> {
return source.alwaysFlowsTo(
allowOverwritingValue = true,
earlyTermination = sinkPredicate,
scope = scope,
sensitivities = sensitivities,
predicate = validatorPredicate,
)
}

/**
* This function tracks if the data in [this] always flow through a node which fulfills [predicate].
* An early termination can be specified by the predicate [earlyTermination].
* [allowOverwritingValue] can be used to configure if overwriting the value (or part of it) results
* in a failure of the requirement (if `false`) or if it does not affect the evaluation. The
* analysis can be configured with [scope] and [sensitivities].
*/
fun Node.alwaysFlowsTo(
allowOverwritingValue: Boolean = false,

Check warning on line 244 in cpg-analysis/src/main/kotlin/de/fraunhofer/aisec/cpg/query/FlowQueries.kt

View check run for this annotation

Codecov / codecov/patch

cpg-analysis/src/main/kotlin/de/fraunhofer/aisec/cpg/query/FlowQueries.kt#L244

Added line #L244 was not covered by tests
earlyTermination: ((Node) -> Boolean)? = null,
scope: AnalysisScope,
vararg sensitivities: AnalysisSensitivity =
ContextSensitive + FieldSensitive + FilterUnreachableEOG,

Check warning on line 248 in cpg-analysis/src/main/kotlin/de/fraunhofer/aisec/cpg/query/FlowQueries.kt

View check run for this annotation

Codecov / codecov/patch

cpg-analysis/src/main/kotlin/de/fraunhofer/aisec/cpg/query/FlowQueries.kt#L248

Added line #L248 was not covered by tests
predicate: (Node) -> Boolean,
): QueryTree<Boolean> {
val nextDFGPaths =
this.collectAllNextDFGPaths(
interproceduralAnalysis = scope is Interprocedural,
contextSensitive = ContextSensitive in sensitivities,
)
.flatten()
val earlyTerminationPredicate = { n: Node, ctx: Context ->
earlyTermination?.let { it(n) } == true ||
(!allowOverwritingValue &&
// TODO: This should be replaced with some check if the memory location/whatever
// where the data is kept is (partially) written to.
this in n.prevDFG &&
(n as? Reference)?.access == AccessValues.WRITE)
}
val nextEOGEvaluation =
this.followEOGEdgesUntilHit(
collectFailedPaths = true,
findAllPossiblePaths = true,
scope = scope,
sensitivities = sensitivities,
earlyTermination = earlyTerminationPredicate,
) {
predicate(it) && it in nextDFGPaths
}
val allChildren =
nextEOGEvaluation.failed.map {
QueryTree(
value = false,
children = mutableListOf(QueryTree(value = it)),
stringRepresentation =
"The EOG path reached the end " +
if (earlyTermination != null)
"(or ${it.lastOrNull()} which a predicate marking the end) "
else "" + "before passing through a node matching the required predicate.",
node = this,
)
} +
nextEOGEvaluation.fulfilled.map {
QueryTree(
value = true,
children = mutableListOf(QueryTree(value = it)),
stringRepresentation =
"The EOG path reached the node ${it.lastOrNull()} matching the required predicate" +
if (earlyTermination != null)
" before reaching a node matching the early termination predicate"
else "",
node = this,
)
}
return QueryTree(
value = nextEOGEvaluation.failed.isEmpty(),
children = allChildren.toMutableList(),
stringRepresentation =
if (nextEOGEvaluation.failed.isEmpty()) {
"All EOG paths fulfilled the predicate"
} else {
"Some EOG paths failed to fulfill the predicate"
},
node = this,
)
}

/**
* Aims to identify if the value which is in [this] reaches a node fulfilling the [predicate] on all
* execution paths. To do so, it goes some data flow steps backwards in the graph, ideally to find
* the last assignment but potentially also splits the value up into different parts (e.g. think of
* a `+` operation where we follow the lhs and the rhs) and then follows this value until the
* [predicate] is fulfilled on all execution paths. Note: Constant values (literals) are not
* followed if [followLiterals] is set to `false`.
*/
fun Node.allNonLiteralsFlowTo(
predicate: (Node) -> Boolean,
allowOverwritingValue: Boolean = false,

Check warning on line 323 in cpg-analysis/src/main/kotlin/de/fraunhofer/aisec/cpg/query/FlowQueries.kt

View check run for this annotation

Codecov / codecov/patch

cpg-analysis/src/main/kotlin/de/fraunhofer/aisec/cpg/query/FlowQueries.kt#L323

Added line #L323 was not covered by tests
scope: AnalysisScope,
vararg sensitivities: AnalysisSensitivity,
followLiterals: Boolean = false,
): QueryTree<Boolean> {
val worklist = mutableListOf<Node>(this)
val finalPathsChecked = mutableListOf<QueryTree<Boolean>>()
while (worklist.isNotEmpty()) {
val currentNode = worklist.removeFirst()
if (currentNode.prevDFG.isEmpty()) {
finalPathsChecked +=
currentNode.alwaysFlowsTo(
predicate = predicate,
allowOverwritingValue = allowOverwritingValue,
scope = scope,
sensitivities = sensitivities,
)
}
currentNode.prevDFG
.filter { followLiterals || it !is Literal<*> }
.forEach {
val pathResult =
it.alwaysFlowsTo(
predicate = predicate,
allowOverwritingValue = allowOverwritingValue,
scope = scope,
sensitivities = sensitivities,
)
if (pathResult.value) {
// This path always ends pathResult in a node fulfilling the predicate
finalPathsChecked.add(pathResult)
} else {
// This path contained some nodes which do not end up in a node fulfilling the
// predicate
worklist.add(it)
}
}
}

return QueryTree(
finalPathsChecked.all { it.value },
finalPathsChecked.toMutableList(),
node = this,
)
}
Loading
Loading