Skip to content
This repository has been archived by the owner on Nov 10, 2022. It is now read-only.

Graql Crash Course

John edited this page Mar 6, 2018 · 1 revision

Introduction

This guide will get you started with Graql, the query language for Grakn. There are a ton of Graql features we won't cover here because they don't make sense in the context of BNIL data. If you want to deep dive into Graql, have at it. Otherwise we'll focus on elements of Graql that are useful for exploring and using datasets generated by Paper Machete.

Grakn Labs recently released Grakn Academy. Check it out.

Importance of Knowing the Ontology (Schema)

GRAKN.AI uses an ontology (fancy word for schema) (read more here) to apply structure to the data you load into it. The concept of the ontology is extremely important when considering our data and the relationships between data. In fact, previous to v1.0, Grakn used the term ontology exclusively, which probably scared and confused a lot of people - now they pretty much only use the term schema.

When we ask Grakn questions about our data with Graql, we need to know the general structure of the data. This is all defined in the Paper Machete ontology.

Consider a function. A function in our ontology looks like this:

function sub entity
	plays in-function
	has func-name
	has asm-address
	has stack;

An 'entity' is a base concept (or 'thing') in Grakn. It says this thing is called a 'function' and has up to three 'resources', a func-name, an asm-address, and a stack. This entity can also play a role called in-function.

The important take away here is that 'resource' names like func-name and asm-address are completely arbitrary. They are names we came up with to describe the properties associated with the concept of a function. So when you start to write Graql queries, and you wonder why we reference things like func-name or operation-type it's because these are the names we specified in the PM ontology. The plays specifier defines a role this concept can play. These roles are important because they let us form relationships between (link) multiple concepts that play compatible roles.

When in doubt, reference the ontology.

The Grakn Data Visualizer

You've analyzed your first binary which spit out a JSON file. You migrated data from that JSON file into Grakn, excitement is mounting! But, now what?

Grakn makes it easy to visualize your data through the Grakn Visualizer, a web front end that starts on localhost port 4567 by default. So fire up a web browser and head to http://127.0.0.1:4567.

Web Interface

Your first Graql queries

Let's assume you've analyzed the Barcoder CGC binary and you want to explore your dataset.

First, we can display some functions in this dataset by executing the query:

match $f isa function;

You should see a bunch of blocks fly out on the screen. If you left-click and hold, a context menu will appear which allows you to display properties of those nodes. Try it! And tick the boxes next to type, func-name, and asm-address, also click the red box to change the color of the function nodes.

function query

If you look closely at the query line, you'll see that your query was changed to:

match $f isa function; offset 0; limit 30;

But why? By default, Grakn will limit the number of results to 30 in order to keep your browser from freaking the hell out should there be millions of results. If you want more results, just increase the limit.

Let's say you're looking for the main function. Simply issue this query:

match $f isa function, has func-name "main";

Let's say you're looking for any function that contains "cgc" in the name:

match $f isa function, has func-name contains "cgc";

cgc filter

Graql also allows you to perform compute queries on the dataset. So let's say you want to know how many functions are in the dataset:

compute count in function;
>> 115

Cool, so the Barcoder binary has 115 functions! What about basic-blocks or instructions? We can compute the count of those as well:

compute count in basic-block;
>> 1359
compute count in instruction;
>> 7290

Whoa, 7290 instructions in 1359 basic-blocks? That's good to know. Graql offers many more compute query types, so you aren't limited to count queries. Although compute queries like min, max, and median don't make much sense for our dataset.

Finding relationships between data

One of the great things about Graql is how easy it is to find relationships between data. Let's say you want to find all instructions that use some form of XOR operation. No problem!

match $i isa instruction; $x isa MLIL_XOR; ($i,$x);

linked nodes xor

Let's break this down, since there are actually three match queries going on here.

The first query finds all entities of type instruction. In other words, it literally returns all instructions in our target binary.

match $i isa instruction;

The second query finds all entities of type MLIL_XOR which is a sub-entity of an operation (again, look at the ontology if you want to see what I mean).

match $x isa MLIL_XOR;

The third query takes these two sets of data and returns the union.

match ($i,$x);

Graql expects us to strip all extraneous match keywords. It knows what you mean without having to type match three times.

So what's the result of this query? Seven instructions. That's only seven instructions out of the 7290 instructions in the Barcoder binary that match our criteria!

cgc_fxlat     :  88 0x80496ff : edx_7#11 = edx_6#10 ^ ecx_6#11 
cgc_fxlat     :  91 0x8049707 : edx_8#12 = edx_7#11 ^ ecx_8#13
cgc_hash_seed :  20 0x80497ce : eax_6#9 = eax_5#8 ^ var_10_1#3
cgc_hash_seed :  26 0x80497e0 : eax_9#12 = eax_8#11 ^ ecx_4#5
cgc_strtoul   : 213 0x804a28d : eax_75#92 = eax_74#91 ^ 0x80000000
cgc_tolower   :  13 0x804c48e : eax_6#8 = eax_5#7 ^ 0x20
cgc_toupper   :  13 0x804c3ce : eax_6#8 = eax_5#7 ^ 0x20

Hopefully you can begin to see the power of Graql, if not, I think you'll be pleased once you start using it on your own.