Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support Granite 3.0 and 3.1 models #558

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JamesKunstle
Copy link

@JamesKunstle JamesKunstle commented Feb 5, 2025

Granite 3.(0,1) models are Llama-architecture models with some different scaling terms in various places. This commit adds granite model patching for decoder-only granite 3 models (not multimodal) and the corresponding tests.

Summary

This change enables patching Granite 3.(0,1) models w/ Liger kernels. We would like to use Liger kernels in our training implementation but we're a Granite-first codebase for the moment.

Testing Done

Convergence tests confirm that loss and model parameters are equivalent w/ and w/o Liger kernels. Logits, however, are not equivalent even when only swapping the SwiGLUMLP layer. The ator and rtol may need to be tuned for Granite vs. Llama, I'm going to continue investigating this before this PR is merged.

  • Hardware Type: EC2 g6e.12xlarge; 4xL40s
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

Granite 3.(0,1) models are Llama-architecture models with some different scaling
terms in various places. This commit adds granite model patching for
decoder-only granite 3 models (not multimodal) and the corresponding
tests.

Signed-off-by: James Kunstle <jkunstle@redhat.com>
@JamesKunstle
Copy link
Author

Fixes #557

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant