Skip to content

emehrkay/gremlinpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Note: 3.6.0 changes the way arguments are automatically bound in Function objects. The old way was to only bind the last argument in a function call, but with 3.6.0 all arguments will be bound. This may break some tests that check the contents of the generated scripts. Please update accordingly

Gremlinpy

Gremlinpy is a small library that allows you to write pure Python and output GremlinGroovy script complete with bound parameters that can be run against a Tinkerpop 3 Gremlin server instance. It is meant to be a low-level way for your application to communicate its intent with the graph server. The best way to think of Gremlinpy is as a very flexible query builder that doesn't limit what can be expressed with Python for GremlinGroovy.

gremlinpy logo by Marko A. Rodriguez

Setup

python setup.py install

or

pip install gremlinpy

or

   easy_install gremlinpy

Overview

Python's syntax nearly mirrors Groovy's one-to-one so Grelinpy allows for an easy to manipulate Python object that will produce a Gremlin string as a result.

Gremlinpy works by tokenizing every action taken on the gremlinpy.gremlin.Gremlin instance into a simple linked list. Each Gremlin instance starts with a GraphVariable and chains the rest of the tokens to it.

If you wanted to produce a Gremlin script like this:

g.v(12).outE('knows').inV

You would simply write:

    g = Gremlin()
    g.v(12).outE('knows').inV

Once that is converted to a string, your gremlin instance will hold the bound params (you have the ability to control the names of the bindings).

    script = str(g) #g.v(GP_KKEI_1).outE(GP_KKEI_2).inV
    params = g.bound_params #{'GP_KKEI_1': 12, 'GP_KKEI_2': 'knows'}

It's that simple, but Gremlinpy allows you to write very complex Gremlin syntax in pure Python. And if you're having trouble expressing the Gremlin/Groovy in the Python model, it allows for straight string manipulation complete with parameter binding.

Though the usage of Python's magic methods, we can compose a Gremlin string by recording every attribute, item, or call against the instance and creating a token for it. There are multiple token types that allow you to express Gremlin with Python:

GraphVariable: This is the root token in the list. It is always present and will be used in every Gremlin string produced. It can be nullified by calling Gremlin.set_graph_variable('') or during instantiation g = Gremlin(''). That is useful for when you are running scripts that may not interact with the graph or when your graph variable is something other than the letter "g".

Attribute: Attributes are things on the Gremlin chain that are not functions, arent closures, and are not indexes. They stand alone and are defined when you call any sequence without parenthesis on your Gremlinpy instance.

    g.a.b #g.a.b -- a and b are the attribues

Function: Functions are called when you add parenthesis after an attribute. The function object will bind every argument passed into it.

    g.V(12) #g.v(GP_UUID_1)
    g.bound_params # {'GP_UUID_1': 12}
    
    g.v.has('name', 'mark') #g.v.has(GP_CXZ_1, GP_CXZ_2)
    g.bound_params # {'GP_CXG_2': 'mark', 'GP_CXZ_1': 'name'}

A function can also be added to the chain by calling the func method on the Gremlin instance. The first argument is the name of the function, the rest are bound arguments in the final resulting string. This is useful for function names that are reserved words in Python.

    g.V(12).func('myMagicFunction', 'arg') # g.V(GP_III_1).myMagicFunction(GP_III_2)
    g.bound_params # {'GP_III_1': 12, 'GP_III_2': 'arg'}

FunctionRaw: Works as Function but does not prepend the dot before the function name.

    g.set_graph_variable('')
    g.function('arg')func_raw('some_function', 'raw content')
    # function(GP_IUNX_1)some_function(raw content)

UnboundFunction: This allows you to call a function, but not have the instance automatically bind any of the params. It has a different syntax than just chaning a function in the previous exmaple:

    g.unbound('function', 'arg', 1, 3, 5) #g.function(arg, 1, 3, 5)

UnboundFunctionRaw: This works like UnboundFunction except it does not prepend a dot before the function definition.

    g.set_graph_variable('')
    g.func_raw_unbound('if', '1 == 2').close('1 is 2?').func_raw_unbound('elseif', '2 == 2').close('2 is 2')
    # if(1 == 2){1 is 2?}elseif(2 == 2){2 is 2}

Closure: Closures simply allow you to put things between curly braces. Since it is an error to add curly braces to the end of Python objects, this has its own method on the Gremlin instance:

    g.func.close('im closing this') # g.func{im closing this}

ClosureArguments: Groovy allows for inline lambda functions in a syntax that isn't supported by Python. To define the signature for the closure you simply pass in args after the first argument on a closure:

    g.func.close('body', 'x', 'y') # g.func{x, y -> body }

Raw: Raw allows you to put anything in and have it passed out the same way. It doesn't put anything before or after the call. It is useful for when you're doing something that cannot easily map:

    g.set_graph_variable('')
        .raw('if(').raw('1 == 2').raw(')')
        .close("'never'")
        .else.close("'always'")
    # if(1 == 2){'never'}else{'always'}

note: this is just an example, there are better ways to do complex composition

Predicates

GremlinGroovy has an idea called predicates. A predicate can simply be seen as a traversal that starts with a function and not a graph variable.

Gremlinpy supports all of the GremlinGroovy defined predicates by sub-classing a Predicate object.

  • eq
  • neq
  • lt
  • lte
  • gt
  • gte
  • inside
  • outside
  • between
  • within
  • without

Gremlinpy also aliases some of the predicates that are reserved words in Python:

  • NOT -- not
  • IS -- is
  • AS -- as
  • IN -- in
  • FROM -- from
  • TO -- to
  • OR -- or

Predicates are used just like any other Gremlin instance:

    from gremlinpy import Gremlin

    g = Gremlin()
    g.V().has('name', neq('mark')) # g.V().has(GP_OOP_1, neq(GP_OOP_2))
    g.bound_params # {'GP_OOP_1': 'name', 'GP_OOP_2': 'mark'}

Gremlinpy allows you to create a predicate on the fly using the _ function:

    from gremlinpy import Gremlin, _

    g = Gremlin()
    g.V().has(_('myPredicate', [1, 2])) # g.V().has(myPredicate(GP_UIY_1, GP_IUY_2))
    g.bound_params # {'GP_UIY_1': 1, 'GP_UIY_2': 2}

Overloading

The Gremlin instance has members that are basically reserved words and will not be passed to your resulting gremlin script.

These include:

  • _init_
  • reset
  • _getitem_
  • set_graph_variable
  • any other mehtod or property on the object

If you need the resulting gremlin script to print out '_init_' or one of the reserved words, you can simply call add_token on your instance:

    init = Function(g, '__init__', ['arg'])
    g.add_token(init) # g.__init__(GP_XXX_1)
    
    init = Attribute(g, '__init__')
    g.add_token(init).xxx() # g.__init__.xxx()
    
    add_token = UnboudFunction(g, 'add_token', [5, 6])
    g.add_token(add_token) # g.add_token(5, 6)

Binding Params

All parameters passed into a function are automatically bound. Each Gremlin instance creates a unique key to hold the bound parameter values to. However, you can manually bind the param and pass a name that you desire.

    bound = g.bind_param('my_value', 'MY_PARAM')
    
    g.v(bound[0]) # g.v(MY_PARAM)
    g.bound_params # {'MY_PARAM': 'my_value'}

Gremlinpy will attempt to reuse binding names in the generated script. If you previously bound 'mark is great' and bind it again, the script will utilize the same parameters for each instance. This will save on the overall payload that is sent to the Gremlin server and it will save on what the Gremlin server has to cache.

    bound = g.bind_param('some value', 'SOME\_KEY')
    g.function(bound[0]).func2(bound[0]) # g.function(SOME\_KEY).func2(SOME_KEY)

The easiest way to bind params is to use the Param object and pass it where needed:

    g = Gremlin()
    name_param = Param('name', 'mark')
    g.V().has('name', name_param) ...

Nesting Instances

Gremlinpy gets interesting when you want to compose a very complex string. It will allow you to nest Gremlin instances passing any bound params up to the root instance.

Nesting allows you to have more control over query creation, it offers some sanity when dealing with huge strings.

    g = Gremlin()
    i = Gremlin()
    
    i.set_graph_variable('').it.setProperty('age', 33)
    g.v(12).close(i) # g.v(GP_XXQ_1){it.setProperty(GP_UYI_3, GP_UYI_1)}
    
    g.bound_params # {'GP_XXQ_1': 12, 'GP_UYI': 33, 'GP_UYI_3': 'age'}

Copying

Gremlinpy has a built-in way to copy instances. This is useful for when you want to branch at a current point along the path, but keep the preceding definitions in tact.

page = get_page()
per_page = get_per_page()

g = Gremlin().hasLabel('user').hasId(15).out('blog_post') # get all of the blog posts for user 15

# we will branch here to get the total number of posts for the user
total = g.copy().size()

# we continue here to paginate
g.range(page, page * per_page)

# run queries

Statements

Gremlinpy allows you to compose very complex gremlin chains. A Gremlinpy Statement object allows you to encapsulate and package a part of that construction.

Gremlinpy works by tokenizing every action against the object instance into a simple linked list, a statement will allow you apply a preset token definiton to a Gremlin instance.

Usage

Statements can be used in a few ways, the simplest is to apply it directly to a Gremlin instance. When used this way the statement will augment the Gremlin instance that is directly applied to.

    class HasMark(Statement):
        """
        this statement simply appends .has('name', 'Mark') to a gremlin script
        """
        def build(self):
            g = self.gremlin()
            
            g.has('name', 'Mark')
    
    g = Gremlin()
    mark = HasMark()
    g.V.apply_statement(mark)
    
    str(g) # g.V.has(GP_IOKH_1, GP_IOKH_2)
    g.bound_params # {'CP_IOKH_1': 'name', 'GP_IOKH_2': 'Mark'}

Statements can also be chained:

    class HasSex(Statement):
        def __init__(self, sex):
            self.sex = sex

        def build(self):
            self.gremlin.has('sex', self.sex)

    g = Gremlin()
    mark = HasMark()
    sex = HasSex('male')

    g.V.apply_statement(mark).apply_statement(sex)
    
    str(g) # g.V.has(GP_IOKH_1, GP_IOKH_2).has(GP_IOKH_3, GP_IOKH_4)
    g.bound_params # {'CP_IOKH_1': 'name', 'GP_IOKH_2': 'Mark', 'GP_IOKH_3': 'sex', 'GP_IOKH_4':  'male'}

A statement can be passed into a Gremlin instance Function, Raw, Closure call. These statements will not modify the Gremlin instance that they are passed into. If you want the statement to have a specialized Gremlin instance, you must pass it into the statement. Otherwise a blank Gremlin instance is created and passed into the Statement.

Note: do not pass in the outer Gremlin instance to Statements that will be used this way as an infinite loop will be created

    class GetV(Statement):
        def __init__(self, id):
            self.id = id
        
        def build(self):
            self.gremlin.v(self.id)
            
    g = Gremlin()
    v = GetV(44)
    if_con = UnboundFunction(g, 'if', '1 == 1')
    
    g.set_graph_variable('').add_token(if_con).close(v)
    
    str(g) # if(1 == 1){g.v(GP_DDIO_1)}
    g.bound_params # {'GP_DDIO_1': 44}

Performance Tweaks

Always Manually Bind Params

If your Gremlin server instance has query caching turned on, manually binding params will allow you to create statements on the server that will pre-parse your query the second time you run it an return results quicker.

If you are not manually binding params, every time you call a script, even the same script, a different one is being sent to the server.

    g.v(12) # g.v(GP_XSX_1)
   
    #later 
    g.v(12) # g.v(GP_POI_1)

If you manually bind the param with a name, the same script will be sent to the server and it will drastically cut down on execution times. This is true even if the param values are changed:

    id = g.bind_param(12, 'eyed')
    g.v('eyed') #g.v(eye_d)
    
    id = g.bind_param(9999, 'eyed')
    g.v('eyed') #g.v(eye_d)  <--- this one executes faster than the first