Gremlin, automatic use of indexes on where() steps #1168
Closed
marco-brandizi
started this conversation in
General
Replies: 1 comment 13 replies
-
Hi @marco-brandizi, I'd like to check what's going on under the hood and why the index is not used. Is there a way where I can have a minimal database where to run your query and see what's going on? Even the database with 1 record for type would be enough. |
Beta Was this translation helpful? Give feedback.
13 replies
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
-
Hi all,
I thought of asking ArcadeDB people on this issue, after having realised that maybe, the core of the problem is about using indexes with Gremlin.
To summarise it, I have edges that are 'ternary relations', ie, in addition to the usual outgoing/incoming vertexes, the relation has an 'evidence' attribute, which contains the ID of a third 'EvidenceType' vertex (which could describe something like: manually curated, imported from <ref>, text mining).
The intuitive query for this (see my own answer) is traversing the edges of interest, adding another traversal over
V()
and usingwhere()
to match the edge's attribute to the vertex ID in the second traversal. This is pretty similar to multiple JOINs in SQL or multiple MATCHes in Cypher. However, in ArcadeDB, it seems to be resolving it with a full scan of the evidence type, for each matched edge in the second part. Apparently, it doesn't use the indexes (neither the one on vertext ID, nor the one on the edge property), nor it caches already-found vertexes (on 100 edges, there are just 4-5 linked evidences).I understand that's usually the way Gremlin is implemented, but I'd like to investigate more: is there a way to tell the Gremlin engine to use the indexes? Is there a way to write that query in pure Gremlin, without using the map-based approach that I discovered later (again see the last part of my answer)? That approach isn't good when the in-memory intermediate result that you need to accumulate is too big.
Am I missing something in Gremlin (I'm quite new to it)?
Note that "model it differently" isn't possible in this case, and also I'm investigating the potential of Gremlin with various data models, including this one, which occurs quite often in my application domain.
Beta Was this translation helpful? Give feedback.
All reactions