-
Notifications
You must be signed in to change notification settings - Fork 9
Graql Crash Course
This guide will get you started with Graql
, the query language for Grakn
. There are a ton of Graql features we won't cover here because they don't make sense in the context of BNIL data. If you want to deep dive into Graql, have at it. Otherwise we'll focus on elements of Graql that are useful for exploring and using datasets generated by Paper Machete.
Grakn Labs recently released Grakn Academy. Check it out.
GRAKN.AI uses an ontology
(fancy word for schema) (read more here) to apply structure to the data you load into it. The concept of the ontology is extremely important when considering our data and the relationships between data. In fact, previous to v1.0, Grakn used the term ontology
exclusively, which probably scared and confused a lot of people - now they pretty much only use the term schema
.
When we ask Grakn questions about our data with Graql, we need to know the general structure of the data. This is all defined in the Paper Machete ontology.
Consider a function. A function in our ontology looks like this:
function sub entity
plays in-function
has func-name
has asm-address
has stack;
An 'entity' is a base concept (or 'thing') in Grakn. It says this thing is called a 'function' and has up to three 'resources', a func-name
, an asm-address
, and a stack
. This entity can also play a role called in-function
.
The important take away here is that 'resource' names like func-name
and asm-address
are completely arbitrary. They are names we came up with to describe the properties associated with the concept of a function. So when you start to write Graql queries, and you wonder why we reference things like func-name
or operation-type
it's because these are the names we specified in the PM ontology. The plays
specifier defines a role this concept can play. These roles are important because they let us form relationships between (link) multiple concepts that play compatible roles.
When in doubt, reference the ontology.
You've analyzed your first binary which spit out a JSON file. You migrated data from that JSON file into Grakn, excitement is mounting! But, now what?
Grakn makes it easy to visualize your data through the Grakn Visualizer, a web front end that starts on localhost
port 4567
by default. So fire up a web browser and head to http://127.0.0.1:4567
.
Let's assume you've analyzed the Barcoder CGC binary and you want to explore your dataset.
First, we can display some functions in this dataset by executing the query:
match $f isa function;
You should see a bunch of blocks fly out on the screen. If you left-click and hold, a context menu will appear which allows you to display properties of those nodes. Try it! And tick the boxes next to type
, func-name
, and asm-address
, also click the red box
to change the color of the function nodes.
If you look closely at the query line, you'll see that your query was changed to:
match $f isa function; offset 0; limit 30;
But why? By default, Grakn will limit the number of results to 30 in order to keep your browser from freaking the hell out should there be millions of results. If you want more results, just increase the limit.
Let's say you're looking for the main
function. Simply issue this query:
match $f isa function, has func-name "main";
Let's say you're looking for any function that contains "cgc" in the name:
match $f isa function, has func-name contains "cgc";
Graql also allows you to perform compute queries on the dataset. So let's say you want to know how many functions are in the dataset:
compute count in function;
>> 115
Cool, so the Barcoder binary has 115 functions! What about basic-blocks or instructions? We can compute the count of those as well:
compute count in basic-block;
>> 1359
compute count in instruction;
>> 7290
Whoa, 7290 instructions in 1359 basic-blocks? That's good to know. Graql offers many more compute query types, so you aren't limited to count
queries. Although compute queries like min
, max
, and median
don't make much sense for our dataset.
One of the great things about Graql is how easy it is to find relationships between data. Let's say you want to find all instructions that use some form of XOR operation. No problem!
match $i isa instruction; $x isa MLIL_XOR; ($i,$x);
Let's break this down, since there are actually three match
queries going on here.
The first query finds all entities of type instruction
. In other words, it literally returns all instructions in our target binary.
match $i isa instruction;
The second query finds all entities of type MLIL_XOR
which is a sub-entity of an operation
(again, look at the ontology if you want to see what I mean).
match $x isa MLIL_XOR;
The third query takes these two sets of data and returns the union.
match ($i,$x);
Graql expects us to strip all extraneous match
keywords. It knows what you mean without having to type match
three times.
So what's the result of this query? Seven instructions. That's only seven instructions out of the 7290 instructions in the Barcoder binary that match our criteria!
cgc_fxlat : 88 0x80496ff : edx_7#11 = edx_6#10 ^ ecx_6#11
cgc_fxlat : 91 0x8049707 : edx_8#12 = edx_7#11 ^ ecx_8#13
cgc_hash_seed : 20 0x80497ce : eax_6#9 = eax_5#8 ^ var_10_1#3
cgc_hash_seed : 26 0x80497e0 : eax_9#12 = eax_8#11 ^ ecx_4#5
cgc_strtoul : 213 0x804a28d : eax_75#92 = eax_74#91 ^ 0x80000000
cgc_tolower : 13 0x804c48e : eax_6#8 = eax_5#7 ^ 0x20
cgc_toupper : 13 0x804c3ce : eax_6#8 = eax_5#7 ^ 0x20
Hopefully you can begin to see the power of Graql, if not, I think you'll be pleased once you start using it on your own.