Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support for Kolors #8801

Closed
2 tasks done
JincanDeng opened this issue Jul 6, 2024 · 7 comments · Fixed by #8812
Closed
2 tasks done

Support for Kolors #8801

JincanDeng opened this issue Jul 6, 2024 · 7 comments · Fixed by #8812

Comments

@JincanDeng
Copy link

JincanDeng commented Jul 6, 2024

Model/Pipeline/Scheduler description

Yesterday Kwai-Kolors published their new model named Kolors, which uses unet as backbone and ChatGLM3 as text encoder.

Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content.

Open source status

  • The model implementation is available.
  • The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Implementation: https://github.com/Kwai-Kolors/Kolors
Weights: https://huggingface.co/Kwai-Kolors/Kolors

@asomoza
Copy link
Member

asomoza commented Jul 6, 2024

Hi, thanks for your work, it's a nice model.

The weights seems to be saved with errors. The diffusion_pytorch_model.safetensors which should be the float32 seems to be the float16 one and the float16 throws an error. I can open a PR to fix it if you want.

If you fix that, we can load the model like this:

text_encoder = ChatGLMModel.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder", torch_dtype=torch.float16)
tokenizer = ChatGLMTokenizer.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder")

pipe = StableDiffusionXLPipeline.from_pretrained(
    "Kwai-Kolors/Kolors",
    tokenizer=tokenizer,
    text_encoder=text_encoder,
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

So basically for this model to work with diffusers without additional dependencies, we'll just need for transformers to add support for ChatGLM and add support for it in the encode_prompt

cc: @yiyixuxu @sayakpaul

@CanvaChen
Copy link
Contributor

CanvaChen commented Jul 6, 2024

@asomoza We actually don’t need to integrate ChatGLM code directly into the transformers. Instead, we can simply utilize the existing infrastructure, similar to the following code snippet:

text_encoder = AutoModel.from_pretrained("THUDM/chatglm3-6b", torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)

@s9anus98a
Copy link

s9anus98a commented Jul 6, 2024

i keep getting memory crashed (for colab free under 12GB vram) even with quantized 4bit

text_encoder = AutoModel.from_pretrained("THUDM/chatglm3-6b", torch_dtype=torch.float16, trust_remote_code=True).quantize(4).cuda()

@sayakpaul
Copy link
Member

I think we should support this model to welcome more models that are inherently multi-lingual. What do we need to get it in?

@JincanDeng
Copy link
Author

Hi, thanks for your work, it's a nice model.

The weights seems to be saved with errors. The diffusion_pytorch_model.safetensors which should be the float32 seems to be the float16 one and the float16 throws an error. I can open a PR to fix it if you want.

If you fix that, we can load the model like this:

text_encoder = ChatGLMModel.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder", torch_dtype=torch.float16)
tokenizer = ChatGLMTokenizer.from_pretrained("Kwai-Kolors/Kolors", subfolder="text_encoder")

pipe = StableDiffusionXLPipeline.from_pretrained(
    "Kwai-Kolors/Kolors",
    tokenizer=tokenizer,
    text_encoder=text_encoder,
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

So basically for this model to work with diffusers without additional dependencies, we'll just need for transformers to add support for ChatGLM and add support for it in the encode_prompt

cc: @yiyixuxu @sayakpaul

Thank you for your suggestion. We've fixed the model's fp16 and fp32 weights on huggingface. However, the pipeline still throws an error when loading directly via from_pretrained.

My running code:

from kolors.pipelines.pipeline_stable_diffusion_xl_chatglm_256 import StableDiffusionXLPipeline
from kolors.models.tokenization_chatglm import ChatGLMTokenizer
from kolors.models.modeling_chatglm import ChatGLMModel

text_encoder = ChatGLMModel.from_pretrained(ckpt_dir, subfolder="text_encoder", torch_dtype=torch.float16)
tokenizer = ChatGLMTokenizer.from_pretrained(ckpt_dir, subfolder="text_encoder")

pipe = StableDiffusionXLPipeline.from_pretrained(
        ckpt,
        tokenizer=tokenizer,
        text_encoder=text_encoder,
        torch_dtype=torch.float16,
        variant="fp16",
).to("cuda")

The error is:

IndexError                                Traceback (most recent call last)
Cell In[13], line 1
----> 1 pipe = StableDiffusionXLPipeline.from_pretrained(
      2     ckpt_dir,
      3     tokenizer=tokenizer,
      4     text_encoder=text_encoder,
      5     torch_dtype=torch.float16,
      6     variant="fp16",
      7 ).to("cuda")

File /new_share/dengjincan/conda/envs/kolors/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    115 if check_use_auth_token:
    116     kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
--> 118 return fn(*args, **kwargs)

File /new_share/dengjincan/conda/envs/kolors/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py:736, in DiffusionPipeline.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    734 folder_path = os.path.join(cached_folder, folder)
    735 is_folder = os.path.isdir(folder_path) and folder in config_dict
--> 736 variant_exists = is_folder and any(
    737     p.split(".")[1].startswith(variant) for p in os.listdir(folder_path)
    738 )
    739 if variant_exists:
    740     model_variants[folder] = variant

File /new_share/dengjincan/conda/envs/kolors/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py:737, in <genexpr>(.0)
    734 folder_path = os.path.join(cached_folder, folder)
    735 is_folder = os.path.isdir(folder_path) and folder in config_dict
    736 variant_exists = is_folder and any(
--> 737     p.split(".")[1].startswith(variant) for p in os.listdir(folder_path)
    738 )
    739 if variant_exists:
    740     model_variants[folder] = variant

IndexError: list index out of range

@asomoza
Copy link
Member

asomoza commented Jul 7, 2024

that error is because you have a pycache directory in the text encoder, if you delete it, it should work.

image

@vladmandic
Copy link
Contributor

tried using standard implementation:

text_encoder = transformers.AutoModel.from_pretrained('THUDM/chatglm3-6b', torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = transformers.AutoTokenizer.from_pretrained('THUDM/chatglm3-6b', trust_remote_code=True)
pipe = diffusers.StableDiffusionXLPipeline.from_pretrained('Kwai-Kolors/Kolors', tokenizer=tokenizer, text_encoder=text_encoder)

this loads text_encoder and tokenizer without issues, but fails initializing pipe:

Kwai-Kolors/Kolors text_encoder/kolors.py as defined in model_index.json does not exist in Kwai-Kolors/Kolors and is not a module in 'diffusers/pipelines'

Kolors pipeline is similar-but-different to SDXL pipeline
which means loading needs to use actual custom pipeline class:

from kolors.models.modeling_chatglm import ChatGLMModel
from kolors.models.tokenization_chatglm import ChatGLMTokenizer
from kolors.pipelines.pipeline_stable_diffusion_xl_chatglm_256 import StableDiffusionXLPipeline

but you should not redefine a well-known StableDiffusionXLPipeline class, that will break tons of other things!
its either custom class or it works as standard StableDiffusionXLPipeline class
and if its a custom class, this needs a full PR

@asomoza asomoza mentioned this issue Jul 9, 2024
2 tasks
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants