Skip to content

Add TextToImage, StableDiffusion3Backbone and StableDiffusion3TextToImage #1816

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 25 commits into from

Conversation

james77777778
Copy link
Collaborator

@james77777778 james77777778 commented Sep 10, 2024

This is more of a draft, as we may need further discussion regarding the implementation.

Notes for reviewing:

  • There are several small modifications in this PR to ensure the numerical stability of all modules/layers. (e.g., LN and Softmax should run in float32)
  • StableDiffusion3Backbone is a large model that includes all necessary submodules, resulting in a very long init signature. Is this acceptable? How could we refactor it?
  • I have figured out how to drop T5 (another 5B model): simply skip it and zero-pad the embeddings from CLIP models.
  • Some ideas were borrowed from https://github.com/huggingface/diffusers , which helped simplify the implementation.
  • Defining a functional model in StableDiffusion3Backbone or StableDiffusion3TextToImage is challenging for me. It may be unnecessary for pure inference purposes.
  • I couldn't compile the entire text_to_image function due to unexpected OOM issues. However, when I split it into encode, denoise and decode functions, it worked fine.
  • I have written a rough but functional script that can convert the weights directly from https://huggingface.co/stabilityai/stable-diffusion-3-medium. Please refer to the colab.

Demo colab:
https://colab.research.google.com/drive/1rrQMs0nlKSEzYNhIJChQwgnrZNiydexS?usp=sharing

"a cat holding a sign that says hello world" "cute wallpaper art of a cat"
1 2

TODO:

  • Rename model folder to stable_diffusion_3
  • Add docstrings
  • Add weight conversion script
  • Add tests

@divyashreepathihalli @mattdangerw @SamanehSaadat

BTW, I will be unavailable from 9/17~9/22

divyashreepathihalli and others added 25 commits August 12, 2024 17:17
* Agg Vgg16 backbone

* update names

* update tests

* update test

* add image classifier

* incorporate review comments

* Update test case

* update backbone test

* add image classifier

* classifier cleanup

* code reformat

* add vgg16 image classifier

* make vgg generic

* update doc string

* update docstring

* add classifier test

* update tests

* update docstring

* address review comments

* code reformat

* update the configs

* address review comments

* fix task saved model test

* update init

* code reformatted
* Add ResNetV1 and ResNetV2

* Address comments
* Add CSP DarkNet

* Add CSP DarkNet

* snake_case function names

* change use_depthwise to block_type
…Backbone` (keras-team#1769)

* Add FeaturePyramidBackbone and update ResNetBackbone

* Simplify the implementation

* Fix CI

* Make ResNetBackbone compatible with timm and add FeaturePyramidBackbone

* Add conversion implementation

* Update docstrings

* Address comments
* Add DenseNet

* fix testcase

* address comments

* nit

* fix lint errors

* move description
* add vit det vit_det_backbone

* update docstring

* code reformat

* fix tests

* address review comments

* bump year on all files

* address review comments

* rename backbone

* fix tests

* change back to ViT

* address review comments

* update image shape
* Add MixTransformer

* fix testcase

* test changes and comments

* lint fix

* update config list

* modify testcase for 2 layers
* update input_image_shape -> image_shape

* update docstring example

* code reformat

* update tests
add missing __init__ file to vit_det
This is a temporary way to test out the keras-hub branch.
- Does a global rename of all symbols during package build.
- Registers the "old" name on symbol export for saving compat.
- Adds a github action to publish every commit to keras-hub as
  a new package.
- Removes our descriptions on PyPI temporarily, until we want
  to message this more broadly.
* Add `CLIPTokenizer`, `T5XXLTokenizer`, `CLIPTextEncoder` and `T5XXLTextEncoder`.

* Make CLIPTextEncoder as Backbone

* Add `T5XXLPreprocessor` and remove `T5XXLTokenizer`

Add `CLIPPreprocessor`

* Use `tf = None` at the top

* Replace manual implementation of `CLIPAttention` with `MultiHeadAttention`
* Bounding box utils

* - Correct test cases

* - Remove hard tensorflow dtype

* - fix api gen

* - Fix import for test cases
- Use setup for converters test case

* - fix api_gen issue

* - FIx api gen

* - Fix api gen error

* - Correct test cases as per new api changes
* mobilenet_v3 added in keras-nlp

* minor bug fixed in mobilenet_v3_backbone

* formatting corrected

* refactoring backbone

* correct_pad_downsample method added

* refactoring backbone

* parameters updated

* Testcaseupdated, expected output shape corrected

* code formatted with black

* testcase updated

* refactoring and description added

* comments updated

* added mobilenet v1 and v2

* merge conflict resolved

* version arg removed, and config options added

* input_shape changed to image_shape in arg

* config updated

* input shape corrected

* comments resolved

* activation function format changed

* minor bug fixed

* minor bug fixed

* added vision_backbone_test

* channel_first bug resolved

* channel_first cases working

* comments  resolved

* formatting fixed

* refactoring

---------

Co-authored-by: ushareng <usha.rengaraju@gmail.com>
* migrating efficientnet models to keras-hub

* merging changes from other sources

* autoformatting pass

* initial consolidation of efficientnet_backbone

* most updates and removing separate implementation

* cleanup, autoformatting, keras generalization

* removed layer examples outside of effiicient net

* many, mainly documentation changes, small test fixes
* Add ResNet_vd to ResNet backbone

* Addressed requested parameter changes

* Fixed tests and updated comments

* Added new parameters to docstring
* Add `VAEImageDecoder` for StableDiffusionV3

* Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`
* add pyramid outputs

* fix testcase

* format fix

* make common testcase for pyramid outputs

* change default shape

* simplify testcase

* test case change and add channel axis
* Add `MMDiT`

* Update

* Update

* Update implementation
* - Add formats, iou, utils for bounding box

* - Add `AnchorGenerator`, `BoxMatcher` and `NonMaxSupression` layers

* - Remove scope_name  not required.

* use default keras name scope

* - Correct format error

* - Remove layers as of now and keep them at model level till keras core supports them

* - Correct api_gen
@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Sep 10, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Sep 10, 2024
@mattdangerw mattdangerw force-pushed the keras-hub branch 2 times, most recently from 1826dce to 753047d Compare September 11, 2024 00:01
@james77777778
Copy link
Collaborator Author

james77777778 commented Sep 11, 2024

The commit history seems chaotic. I will try to rebase it today.

@james77777778
Copy link
Collaborator Author

Since the keras-hub branch was force-pushed, I was unable to rebase it. Therefore, I submitted a new PR for SD3:
#1820

@james77777778 james77777778 deleted the add-sdv3 branch October 3, 2024 04:29
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants