Skip to content

Add qk norm optionally before attention calculation #8820

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Mar 6, 2025

Conversation

madhu-fb
Copy link
Contributor

Differential Revision: D70355802

Copy link

pytorch-bot bot commented Feb 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8820

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Cancelled Jobs

As of commit 5b2587a with merge base 73acde9 (image):

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 28, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70355802

@madhu-fb madhu-fb marked this pull request as draft February 28, 2025 05:54
@madhu-fb
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@madhu-fb madhu-fb marked this pull request as ready for review March 5, 2025 19:42
madhu-fb added a commit to madhu-fb/executorch that referenced this pull request Mar 5, 2025
Summary:


Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Differential Revision: D70355802
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70355802

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request Mar 5, 2025
Summary:
Pull Request resolved: pytorch#8820

Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Differential Revision: D70355802
Copy link
Contributor

@iseeyuan iseeyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @madhu-fb !

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request Mar 6, 2025
Summary:


Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802
madhu-fb added a commit to madhu-fb/executorch that referenced this pull request Mar 6, 2025
Summary:


Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70355802

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request Mar 6, 2025
Summary:
Pull Request resolved: pytorch#8820

Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802
madhu-fb added a commit to madhu-fb/executorch that referenced this pull request Mar 6, 2025
Summary:
Pull Request resolved: pytorch#8820

Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70355802

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request Mar 6, 2025
Summary:


Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70355802

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request Mar 6, 2025
Summary:
Pull Request resolved: pytorch#8820

Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802
@madhu-fb madhu-fb force-pushed the export-D70355802 branch 2 times, most recently from 51a970b to 388f54d Compare March 6, 2025 19:59
madhu-fb added a commit to madhu-fb/executorch that referenced this pull request Mar 6, 2025
Summary:


Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802
Summary:
Pull Request resolved: pytorch#8820

Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70355802

@facebook-github-bot facebook-github-bot merged commit 352416e into pytorch:main Mar 6, 2025
48 of 52 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants