SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders (ECCV 2024)
Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang†, Jane Yung-jen Hsu† (†corresponding authors)
SA-DVAE stands for Semantic Alignment via Disentangled Variational Autoencoders.
SA-DVAE improves zero-shot skeleton-based action recognition by aligning modality-specific VAEs and disentangling skeleton features into semantic and non-semantic parts, achieving better performance on NTU RGB+D, NTU RGB+D 120, and PKU-MMD datasets.
The codebase has been tested with the following setup:
- Operating System: Ubuntu 22.04
- Python Version: 3.10
- GPU: 1x NVIDIA RTX 3090 with CUDA version 12.7
-
Clone the Repository
git clone https://github.com/pha123661/SA-DVAE.git cd SA-DVAE
-
Install Dependencies
pip install -r requirements.txt
-
Download Pre-extracted Features
- Download the pre-extracted features for the NTU-60, NTU-120, and PKU-MMD datasets here.
- Extract the
resources.zip
file. - Place all subdirectories under
./resources
.
Optional: Generate Features Yourself
- Download the class descriptions at
./class_lists
and skeleton features at NTU RGB+D and PKUMMD. - Use sentence_transformers or transformers packages to extract semantic features.
- Use mmaction2 to train and extract skeleton features.
-
Ensure that the directory structure is as follows:
SA-DVAE ├── resources │ ├── label_splits │ ├── sk_feats │ └── text_feats ...
We provide three training scripts in ./scripts
, each corresponding to the three main experiments in our paper:
-
Comparison with SOTA Methods
./scripts/train_eval_synse_split.sh {dataset}
-
Random Class Splits
./scripts/train_eval_average_random_split.sh {dataset}
- This script runs experiments on three different seen/unseen class splits.
-
Enhanced Class Descriptions by a Large Language Model (LLM)
./scripts/train_eval_llm_descriptions.sh {dataset}
- This script runs experiments on three different seen/unseen class splits.
where dataset
should be one of ntu60
, ntu120
, and pku51
.
Each training script follows these four stages, covering both Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) training and evaluation:
-
Train and evaluate SA-DVAE for ZSL.
-
Prepare
$\mathbf{p}_s$ and$\mathbf{p}_u$ for Domain Classifier Training. -
Train the Domain Classifier.
-
Evaluate SA-DVAE under GZSL.
Our codebase is mainly built upon skelemoa/synse-zsl. We thank the authors for their excellent work.
@inproceedings{li2024sadvae,
title={SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders},
author={Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang, Jane Yung-jen Hsu},
booktitle={European Conference on Computer Vision (ECCV)},
year={2024}
}