Support Granite 3.0 and 3.1 models #558

JamesKunstle · 2025-02-05T03:37:33Z

Granite 3.(0,1) models are Llama-architecture models with some different scaling terms in various places. This commit adds granite model patching for decoder-only granite 3 models (not multimodal) and the corresponding tests.

Summary

This change enables patching Granite 3.(0,1) models w/ Liger kernels. We would like to use Liger kernels in our training implementation but we're a Granite-first codebase for the moment.

Testing Done

Convergence tests confirm that loss and model parameters are equivalent w/ and w/o Liger kernels. Logits, however, are not equivalent even when only swapping the SwiGLUMLP layer. The ator and rtol may need to be tuned for Granite vs. Llama, I'm going to continue investigating this before this PR is merged.

Hardware Type: EC2 g6e.12xlarge; 4xL40s
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Granite 3.(0,1) models are Llama-architecture models with some different scaling terms in various places. This commit adds granite model patching for decoder-only granite 3 models (not multimodal) and the corresponding tests. Signed-off-by: James Kunstle <jkunstle@redhat.com>

JamesKunstle · 2025-02-05T03:38:05Z

Fixes #557

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Granite 3.0 and 3.1 models #558

Support Granite 3.0 and 3.1 models #558

JamesKunstle commented Feb 5, 2025 •

edited

Loading

JamesKunstle commented Feb 5, 2025

Support Granite 3.0 and 3.1 models #558

Are you sure you want to change the base?

Support Granite 3.0 and 3.1 models #558

Conversation

JamesKunstle commented Feb 5, 2025 • edited Loading

Summary

Testing Done

JamesKunstle commented Feb 5, 2025

JamesKunstle commented Feb 5, 2025 •

edited

Loading