-
Notifications
You must be signed in to change notification settings - Fork 28
GID Graph format
Mehdi edited this page Aug 23, 2022
·
2 revisions
This page briefly describes the format of the GID Graph which is produced by the Metadata Plugin and consumed by Graph Plugin.
Version 1
GID Graph represents a graph with both internal and external nodes and edges between them. GID stands for Global Identifier which means that all nodes in the graph are globally unique because they were generated by Metadata Database (PostgreSQL).
Below is the example of the GID Graph that is used to communicate between Metadata and Graph Plugins:
{
"index": 1,
"product": "test",
"version": "0.0.1",
"nodes": [0, 1, 2],
"numInternalNodes": 3,
"edges": [[0, 1], [1, 2]]
}
-
index
is the ID of the package version of the product generated by the Metadata Database. It is needed to be able to retrieve a graph from the Graph Database by its corresponding global index of package version. -
product
is the name of the package which is being saved i.e<groupId>.<artifactId>
-
version
is the version of the package that is being saved -
nodes
is the array of GIDs of corresponding nodes. Important! In the array of nodes, there must be first listed all internal nodes, and only then all external. This order is important to be able to differentiate between internal and external nodes -
numInternalNodes
is the number of the internal nodes listed in thenodes
array -
edges
is the array of arrays (pairs) of nodes that represents edges of the graph. NB! If there are any nodes in theedges
which weren't listed in thenodes
array,IllegalArgumentException
will be thrown in Graph Plugin upon consumption of such GID graph
Version 2
{
"index": 1,
"product": "test",
"version": "0.0.1",
"nodes": [0,1,2],
"numInternalNodes": 3,
"edges": [],
"callsites_info": {
"[0, 1]": {
"line": 31,
"receiver_type_ids": [5, ...],
"call_type": "virtual"
}, ...
},
"types_map": {"5": "/java.util/Collections", ...},
"gid_to_uri": {"0": "/java.util/Collections.emptySet()Set", ...}
}
Version 2 is an extension of the first version. We add multiple additional data to the previous representation. that allows us to stitch call graphs on demand. The additional data is the following:
-
callsites-info
is necessary information about call sites that we need to find all potential targets. This includesreceiver_type_ids
which are the types that are used to make this call andcall_type
which is the bytecode instruction used in this call indicating whether or not a call is for example dynamic dispatch. -
types_map
is a map ofid
s that we use to refer to types. For example, in the above example withinreceiver_type_ids
section we can use id5
instead of writing the full name of the type JavaCollections
. -
gid_to_uri
similar totypes_map
we use a map to store the full uris of the methods and then use theid
s instead of the full string name of the method.