Error while training #3

yahooo-m · 2024-11-25T03:09:40Z

Hi, when running train_joint.py, I met the error "TypeError: VideoLISAForCausalLM.model_forward() missing 1 required positional argument: 'dense_indices'". And I check the input_dict, it actually does not have this key: ['image_paths', 'images', 'images_clip', 'input_ids', 'labels', 'attention_masks', 'masks_list', 'label_list', 'valid_indices', 'resize_list', 'offset', 'questions_list', 'sampled_classes_list', 'inference', 'conversation_list'].

JosephPai · 2024-11-25T17:56:42Z

Hi, thanks for raising this issue.
'dense_indices' should be the 'valid_indices' in the input_dict.
I have updated the dataset.py for consistent variable name.
Feel free to let me know if you have any other questions.

yahooo-m · 2024-11-26T02:24:17Z

Thanks!
And how to change the GPU number?

JosephPai · 2024-11-26T15:42:35Z

In the training script, say if you want to train the model with only 4 GPUs, you start the deepspeed job with
deepspeed --include localhost:4,5,6,7 train_joint.py

Lexarymade · 2024-11-27T06:37:27Z

hi, dear author, I'm wondering whether there are gonna be evaluation scripts for other datasets reported in the paper, eg Refer-DAVIS-17 and your proposed ReasonVOS datasets. Thanks!

JosephPai · 2024-11-27T09:21:18Z

Hi @Lexarymade , we are actively working on organizing the data and evaluation scripts.
We just released the ReasonVOS benchmark: https://github.com/showlab/VideoLISA/blob/main/BENCHMARK.md
You can slightly modify the evaluation script of MeViS to evaluate the ReasonVOS benchmark, as their data structures are very similar.

As for Ref-DAVIS-17, its evaluation is a bit complicated as it relies on another evaluation toolkit. We will organize an instruction on how to evaluate it recently.

yahooo-m · 2024-11-28T13:14:17Z

Hi, have u tried to train the model on A100? I find that it may take 8 days for training on 8 A100. Is the flash-attention not used in this project?

JosephPai · 2024-12-14T11:14:14Z

Hi @yahooo-m , when developing this project, we did not investigate the implementation with flash-attn. It seems that Phi-3 series models would not automatically trigger flash-attn unless we explicitly specify it. This seems to be typical issue according to this and this.

To use flash-attn, you can modify the training script:

model = VideoLISAForCausalLM.from_pretrained(
        args.version, torch_dtype=torch_dtype, low_cpu_mem_usage=True,
        cache_dir="/home/ubuntu/.cache/huggingface/hub",
        attn_implementation="flash_attention_2",                # add this line
        **model_args
    )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while training #3

Error while training #3

yahooo-m commented Nov 25, 2024

JosephPai commented Nov 25, 2024

yahooo-m commented Nov 26, 2024

JosephPai commented Nov 26, 2024

Lexarymade commented Nov 27, 2024

JosephPai commented Nov 27, 2024

yahooo-m commented Nov 28, 2024 •

edited

Loading

JosephPai commented Dec 14, 2024

Error while training #3

Error while training #3

Comments

yahooo-m commented Nov 25, 2024

JosephPai commented Nov 25, 2024

yahooo-m commented Nov 26, 2024

JosephPai commented Nov 26, 2024

Lexarymade commented Nov 27, 2024

JosephPai commented Nov 27, 2024

yahooo-m commented Nov 28, 2024 • edited Loading

JosephPai commented Dec 14, 2024

yahooo-m commented Nov 28, 2024 •

edited

Loading