Release V 0.2.2 Release Note · awslabs/graphstorm

GraphStorm V0.2.2 release contains a few major feature enhancements. In this release, we have enhanced the NVIDIA WholeGraph support to speed up the access to learnable embedding training and cached BERT embedding. We have added customized negative sampling method for link prediction tasks, which enables users to define negative edges for each individual edge. We have provided two new feature transformations in our distributed graph processing pipeline, including textual feature tokenization with HuggingFace models and textual feature encoding with HuggingFace models. We further simplified the command line interface for model prototyping by removing the requirement of setting up ssh for running GraphStorm jobs on a single machine. We also added an example of doing GPEFT training to enhance LLM with graph data using the custom model interface.

Major features

Support using WholeGraph distributed embedding to speedup learnable embedding training. #677 #697 #734
Support using WholeGraph distributed embedding to speedup cached BERT embedding read. #737
Support hard negative for link prediction tasks. #678 #684 #703
Distributed graph processing pipeline supports using HuggingFace models to encode textual node features #724
Distributed graph processing pipeline supports using HuggingFace models to tokenize textual node features #700
Support running GraphStorm jobs on a single machine without using ssh. #712

New Examples

Add the GPEFT method to enhance LLM with graph data as a GraphStorm example using the custom model interface. It trains a GNN model to encode the neighborhood of a target node as a prompt and perform parameter efficient fine-tuning (PEFT) to enhance LLM for computing the node representation of the target node. See GPEFT example for how to run. #673 #701

Minor features

Add a support to balance training/valid/test in graph partitioning for node classification tasks. #714 #741
Allow users to start training/inferring job without specifying target_ntype/target_etype on homogeneous graph. #686 #683
Unify the ID mapping output of GConstruct and GProcessing. #461

Breaking changes

Previously, GConstruct created the ID mappings as a single Parquet file, with its filename prefixed by the node type. After 0.2.2 release, GConstruct will create partitioned Parquet files for each node type under its own directory. This change unifies the output of GConstruct and GProcessing. See more details in #461.
We unify the behavior of handling errors in evaluation functions. Previous, evaluation functions, such as roc_auc or f1 score will not raise an exception when an error happens. After 0.2.2 release, evaluation functions will stop the code running and raise an exception with a corresponding error message when an error happens. See more details in #711.

Contributors

Da Zheng from AWS
Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Qi Zhu from AWS
Zichen Wang from AWS
Chang Liu from NVidia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V 0.2.2 Release Note