Skip to content

V 0.2.2 Release Note

Compare
Choose a tag to compare
@classicsong classicsong released this 26 Feb 20:35

GraphStorm V0.2.2 release contains a few major feature enhancements. In this release, we have enhanced the NVIDIA WholeGraph support to speed up the access to learnable embedding training and cached BERT embedding. We have added customized negative sampling method for link prediction tasks, which enables users to define negative edges for each individual edge. We have provided two new feature transformations in our distributed graph processing pipeline, including textual feature tokenization with HuggingFace models and textual feature encoding with HuggingFace models. We further simplified the command line interface for model prototyping by removing the requirement of setting up ssh for running GraphStorm jobs on a single machine. We also added an example of doing GPEFT training to enhance LLM with graph data using the custom model interface.

Major features

  • Support using WholeGraph distributed embedding to speedup learnable embedding training. #677 #697 #734
  • Support using WholeGraph distributed embedding to speedup cached BERT embedding read. #737
  • Support hard negative for link prediction tasks. #678 #684 #703
  • Distributed graph processing pipeline supports using HuggingFace models to encode textual node features #724
  • Distributed graph processing pipeline supports using HuggingFace models to tokenize textual node features #700
  • Support running GraphStorm jobs on a single machine without using ssh. #712

New Examples

  • Add the GPEFT method to enhance LLM with graph data as a GraphStorm example using the custom model interface. It trains a GNN model to encode the neighborhood of a target node as a prompt and perform parameter efficient fine-tuning (PEFT) to enhance LLM for computing the node representation of the target node. See GPEFT example for how to run. #673 #701

Minor features

  • Add a support to balance training/valid/test in graph partitioning for node classification tasks. #714 #741
  • Allow users to start training/inferring job without specifying target_ntype/target_etype on homogeneous graph. #686 #683
  • Unify the ID mapping output of GConstruct and GProcessing. #461

Breaking changes

  • Previously, GConstruct created the ID mappings as a single Parquet file, with its filename prefixed by the node type. After 0.2.2 release, GConstruct will create partitioned Parquet files for each node type under its own directory. This change unifies the output of GConstruct and GProcessing. See more details in #461.
  • We unify the behavior of handling errors in evaluation functions. Previous, evaluation functions, such as roc_auc or f1 score will not raise an exception when an error happens. After 0.2.2 release, evaluation functions will stop the code running and raise an exception with a corresponding error message when an error happens. See more details in #711.

Contributors