Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Feature: Add random sampling function to help users use Kùzu for training GNNs #4665

Open
prrao87 opened this issue Dec 23, 2024 · 0 comments
Labels
feature New features or missing components of existing features

Comments

@prrao87
Copy link
Member

prrao87 commented Dec 23, 2024

API

Other

Description

Kùzu is mentioned in the following paper about using a graph database as a backend while training GNNs.
https://arxiv.org/pdf/2411.11375

As described in Figure 4 of the paper, one of the key steps during training is to perform random sampling of the returned nodes from a 2-hop query as follows:

MATCH (node_0:$NODE_TYPE)
WHERE node_0.id IN $SEED_NODES
OPTIONAL MATCH (node_0)-[rel_1:$REL_TYPE]->(node_1:$NODE_TYPE)-[rel_2:$REL_TYPE]->(node_2:$NODE_TYPE)
WITH node_0, node_1, node_2
ORDER BY rand()
LIMIT $MAX_NEIGHBOURS
RETURN
    node_0.id as src_id,
    node_1.id, node_1.features,
    node_2.id, node_2.features;

The ORDER BY rand() part is where the random sampling suffers on two counts:

  • The randomness isn't truly random, as the rand() function doesn't offer a high enough level of randomness for the purposes of training (The author of the paper noted that they had to do some more workarounds to add more randomness)
  • The above query is slow, as it's doing top-k (we could do better from a query performance perspective)

Feature

Per the author's observations, the desired graph database feature would be to provide a function or some high-level utility in Cypher where the random sampling is pushed down to the database layer and isn't done by PyG in-memory. Could we add such a function that allows users to perform random sampling for the purposes of training GNNs using Kùzu?

@prrao87 prrao87 added the feature New features or missing components of existing features label Dec 23, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature New features or missing components of existing features
Projects
None yet
Development

No branches or pull requests

1 participant