Release GraphStorm v0.2.1 release · awslabs/graphstorm

GraphStorm V0.2.1 release contains a few major feature enhancements. In this release, we have enhanced the GraphStorm model inference use experience by automatically mapping inference results (prediction results and generated node embeddings) into Raw Node ID space, i.e., the same ID space as the input raws data. The resulting output will be stored in parquet format. We have added a new inference command (graphstorm.run.gs_gen_node_embedding) for computing node embeddings on any given graph with a trained GraphStorm model. We have improved our distributed graph processing pipeline to provide multiple feature transformations including categorical feature transformation, numerical bucketing, etc. We added GAT model in GraphStorm model zoo. We also added a demo of running GraphStorm using Jupyter Notebook.

Major features

Automatically map inference results (prediction results and generated node embeddings) into Raw Node ID space (#481, #524, #527, #543, #533, #578, #597, #621, #633, #641)
Provide a command line to generate GNN embeddings (#478)
Provide multiple feature transformations include categorical feature transformation (#623), Rank-Gauss (#615), numerical bucketing (#583), Min/Max normalization (#575)

Minor features

Support caching BERT embeddings on disks for GNN model fine-tuning. #516
Allows customization of GLEM trainable parameters grouping. #506
Support using NVidia WholeGraph to store edge features #555
Add contrastive loss for link prediction tasks #619
Support in-batch negative for link prediction tasks #596
Support NCCL backed for sparse embedding #549

New Built-in Models

GAT (#602, #607)

Breaking changes

We changed the file format and the content of saved node embeddings and saved prediction results of GraphStorm training and inference pipelines. By default, if the task is launch through a command under graphstorm.run.*, GraphStorm will automatically save generated node embeddings and prediction results in parquet files. For node embeddings, the files will contain two columns: column “nid” storing the node IDs in the raw node ID space and column “emb” storing the node embeddings. For node prediction results, the files will contain two columns: column “nid” storing the node IDs in the raw node ID space and column “pred” storing the prediction results. For edge prediction results, the files will contain three columns: column “src_nid” and “dst_nid” storing the node IDs of source nodes and destination nodes in the raw node ID space respectively and column “pred” storing the prediction results.

Contributors

Da Zheng from AWS
Xiang Song from AWS
Jian Zhang from AWS
Theodore Vasiloudis from AWS
Runjie Ma from AWS
Israt Nisa from AWS
Qi Zhu from AWS
Zichen Wang from AWS
Nicolas Castet from NVidia
Chang Liu from NVidia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GraphStorm v0.2.1 release

Major features

Minor features

New Built-in Models

Breaking changes

Contributors