-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fix: CogVideox train dataset _preprocess_data crop video #9574
fix: CogVideox train dataset _preprocess_data crop video #9574
Conversation
…orms` as it caused image overexposure. Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none'].
Cc: @a-r-r-o-w |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
frames = frames.float() | ||
frames = torch.stack([train_transforms(frame) for frame in frames], dim=0) | ||
videos.append(frames.permute(0, 3, 1, 2).contiguous()) # [F, C, H, W] | ||
tensor = frames.float() / 255.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we are making the tensors to range [0, 1]
, instead of [-1, 1]
. In the original codebase, we convert to [-1, 1]
as well here if I understand correctly, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we are making the tensors to range
[0, 1]
, instead of[-1, 1]
. In the original codebase, we convert to[-1, 1]
as well here if I understand correctly, yes?
You're right, it should be in the [-1, 1] range. In fact, this is for matrix calculations during fine-tuning, and the [-1, 1] range is easier for computation. I forgot that this step is handled in the latten2img process, so the image is in the [0, 1] range, while the latent space is in the [-1, 1] range.
I've already verified that the cause of the training result showing a blank screen is that I input a 960x720 image into the dataset, and it was compressed to a 460x720 image for training directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/THUDM/CogVideo/blob/111756a6a68a8df375ef9c31f9f325818699dfaa/sat/data_video.py#L437
The number 127.5 may experience precision loss during division operations.
encode : images / 255.0 * 2.0 - 1.0
decode: (images / 2 + 0.5).clamp(0, 1)
encode : (frames - 127.5) / 127.5
decode: (images / 2 + 0.5).clamp(0, 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have one question. Apart from that, this looks great! Could you also run make style
?
cc @yiyixuxu if we want the ipynb notebook here or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements! Have you verified training on the new settings? I think it would be good to default --video_reshape_mode
to maintain compatibility with existing *.sh
scripts for launching training that others might have locally setup.
Could you host the ipynb notebook on https://gist.github.com/ instead of here, and link to it instead? We try to limit notebooks, pngs/mp4s/gifs and other files in the repo otherwise it can soon compound to a bulky clone.
I think make style
is also needed to quality tests
move video_fix_rgb_float_and_crop.ipynb to https://gist.github.com/glide-the/7658dbfd5f555be0a1a687a4139dba40 |
…#9574) * Removed int8 to float32 conversion (`* 2.0 - 1.0`) from `train_transforms` as it caused image overexposure. Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none']. * The number 127.5 may experience precision loss during division operations. * wandb request pil image Type * Resizing bug * del jupyter * make style * Update examples/cogvideo/README.md * make style --------- Co-authored-by: --unset <--unset> Co-authored-by: Aryan <aryan@huggingface.co>
* Removed int8 to float32 conversion (`* 2.0 - 1.0`) from `train_transforms` as it caused image overexposure. Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none']. * The number 127.5 may experience precision loss during division operations. * wandb request pil image Type * Resizing bug * del jupyter * make style * Update examples/cogvideo/README.md * make style --------- Co-authored-by: --unset <--unset> Co-authored-by: Aryan <aryan@huggingface.co>
Removed int8 to float32 conversion (
* 2.0 - 1.0
) fromtrain_transforms
as it caused image overexposure.Added
_resize_for_rectangle_crop
function to enable video cropping functionality. The cropping mode can be configured viavideo_reshape_mode
, supporting options: ['center', 'random', 'none'].@sayakpaul