Skip to content

ToTuanAn/AICityChallenge2024_Track2

Repository files navigation

[CVPRW] 2024 AI City Challenge: Multi-perspective Traffic Video Description Model with Fine grained Refinement Approach

  • Paper: [To be updated]
  • Contributor: Tuan-An To, Minh-Nam Tran, Trong-Bao Ho, Thien-Loc Ha, Quang-Tan Nguyen, Hoang-Chau Luong, Thanh-Duy Cao

Framework

Alt text

Setup

virtualenv venv
scoure venv/bin/activate
pip install -r requirements.txt

Single-view

Preprocessing

  1. Convert to from the WTS to YouCook format.

Convert the competition dataset to YouCook-format annotation files by running the following commands:

python bdd5k2youcook_by_phase.py --videos_dir datasets/BDD_PC_5K/videos --caption_dir datasets/BDD_PC_5K/captions --output_dir annotations/BDD_PC_5K --merge_val

python wts2youcook_by_phase.py --videos_dir datasets/BDD_PC_5K/videos --caption_dir datasets/WTS/captions --output_dir annotations/WTS --merge_val

where the --merge_val flag is used to merge the validation set into the training set, --videos_dir is the path to the video directory, --caption_dir is the path to the caption directory, and --output_dir is the path to the output directory.

The structure of the video directory of BDD_PC_5K should be as follows:

bdd_pc_5k_video_dir /
├── train /
│   └── vehicle_view /
│       ├── video1.mp4
│       ├── video2.mp4
│       └── ...
│   
├── val /
│   └── vehicle_view /
│       ├── video1.mp4
│       ├── video2.mp4
│       └── ...

and the structure of the video directory of WTS should be as follows:

wts_video_dir /
├── train /
│   ├── video_id_1 /
│   │   ├── overhead_view /
│   │   |   ├── abc.mp4
│   │   |   └── ...
│   │   └── vehicle_view /
│   │       ├── abc.mp4
│   │       └── ...
│   ├── video_id_2 /
│   │   └── overhead_view /
│   │       ├── abc.mp4
│   │       └── ...
│   └── ...
│
├── val /
│   ├── video_id_1 /
│   |   ├── overhead_view /
│   |   |   ├── abc.mp4
│   |   |   └── ...
│   |   └── vehicle_view /
│   |       ├── abc.mp4
│   |       └── ...
│   ├── video_id_2 /
│   └── ...

The caption directory structures are similar to the corresponding video directory structures.

  1. Construct a csv file that each row is a <video_path>, <feature_path>. For example: data/video1.mp4, data/video1.npy
cd vid2seq
python extract/make_csv.py
python extract/extract.py --csv <csv_path> --framerate 30

Inference

  1. Go to notebooks/vid2seq_inference.ipynb and construct the checkpoint path and test-set embedded feature path. Then run the notebook.
  2. Go to scripts/get_submission.sh and construct the checkpoint path and test-set embedded feature path.
cd vid2seq
bash ../scripts/get_submission.sh

Training

  1. Upload the train-set embedded feature on Kaggle.
  2. Training notebook on Kaggle: [TRAINING] SINGLE VIEW MODEL.

Motion-Blur

Please check Motion-Blur Link.

Multi-view

Please check Multiview Link.

Acknowledgement

We would like to thank the Vid2Seq repository for their outstanding video captioning.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published