Add qk norm optionally before attention calculation #8820

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

#

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Jump to bottom

Merged

facebook-github-bot merged 1 commit into pytorch:main from madhu-fb:export-D70355802

Mar 6, 2025

Contributor

madhu-fb commented Feb 28, 2025

Differential Revision: D70355802

madhu-fb requested review from lucylq and jackzhxng as code owners

February 28, 2025 05:53

pytorch-bot bot commented Feb 28, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8820

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Cancelled Jobs

As of commit 5b2587a with merge base 73acde9 ():

CANCELLED JOBS - The following jobs were cancelled. Please retry:

pull / unittest / macos / macos-job (gh)
##[error]The operation was canceled.
pull / unittest-editable / macos / macos-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented Feb 28, 2025

This pull request was exported from Phabricator. Differential Revision: D70355802

facebook-github-bot added the fb-exported label

madhu-fb marked this pull request as draft

February 28, 2025 05:54

Contributor Author

madhu-fb commented Feb 28, 2025

@pytorchbot label "topic: not user facing"

pytorch-bot bot added the topic: not user facing label

madhu-fb marked this pull request as ready for review

March 5, 2025 19:42

madhu-fb force-pushed the export-D70355802 branch from e932c2d to 70f8a2d Compare

March 5, 2025 22:20

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request


          Add qk norm optionally before attention calculation (pytorch#8820)

70f8a2d

Summary:


Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Differential Revision: D70355802

Contributor

facebook-github-bot commented Mar 5, 2025

This pull request was exported from Phabricator. Differential Revision: D70355802

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request


          Add qk norm optionally before attention calculation (pytorch#8820)

4d64a51

Summary:
Pull Request resolved: pytorch#8820

Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Differential Revision: D70355802

madhu-fb force-pushed the export-D70355802 branch from 70f8a2d to 4d64a51 Compare

March 5, 2025 22:23

iseeyuan approved these changes

View reviewed changes

Contributor

iseeyuan left a comment

LGTM. Thank you @madhu-fb !

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request


          Add qk norm optionally before attention calculation (pytorch#8820)

44f9842

Summary:


Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802

madhu-fb force-pushed the export-D70355802 branch from 4d64a51 to 44f9842 Compare

March 6, 2025 00:11

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request


          Add qk norm optionally before attention calculation (pytorch#8820)

f850b04

Summary:


Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802

madhu-fb force-pushed the export-D70355802 branch from 44f9842 to f850b04 Compare

March 6, 2025 00:19

Contributor

facebook-github-bot commented Mar 6, 2025

This pull request was exported from Phabricator. Differential Revision: D70355802

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request


          Add qk norm optionally before attention calculation (pytorch#8820)

66f92c5

Summary:
Pull Request resolved: pytorch#8820

Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802

madhu-fb force-pushed the export-D70355802 branch from f850b04 to 66f92c5 Compare

March 6, 2025 00:19

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request


          Add qk norm optionally before attention calculation (pytorch#8820)

90b24b1

Summary:
Pull Request resolved: pytorch#8820

Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802

madhu-fb force-pushed the export-D70355802 branch from 66f92c5 to 90b24b1 Compare

March 6, 2025 00:22

Contributor

facebook-github-bot commented Mar 6, 2025

This pull request was exported from Phabricator. Differential Revision: D70355802

madhu-fb force-pushed the export-D70355802 branch from 90b24b1 to 0cb1641 Compare

March 6, 2025 00:41

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request


          Add qk norm optionally before attention calculation (pytorch#8820)

0cb1641

Summary:


Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802

Contributor

facebook-github-bot commented Mar 6, 2025

This pull request was exported from Phabricator. Differential Revision: D70355802

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request


          Add qk norm optionally before attention calculation (pytorch#8820)

51a970b

Summary:
Pull Request resolved: pytorch#8820

Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802

madhu-fb force-pushed the export-D70355802 branch 2 times, most recently from 51a970b to 388f54d Compare

March 6, 2025 19:59

madhu-fb added a commit to madhu-fb/executorch that referenced this pull request


          Add qk norm optionally before attention calculation (pytorch#8820)

388f54d

Summary:


Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802


          Add qk norm optionally before attention calculation (pytorch#8820)

5b2587a

Summary:
Pull Request resolved: pytorch#8820

Some of the new llama checkpoints developed by genai use an additional qk_norm in the attention calculation. To run these models with executorch and have parity with server models, we require an optional qk norm in the ET attention.

Refactoring RMSNorm into a separate file so that there is no circular dependency between attention and llama_transformer

Reviewed By: iseeyuan

Differential Revision: D70355802

Contributor

facebook-github-bot commented Mar 6, 2025

This pull request was exported from Phabricator. Differential Revision: D70355802

madhu-fb force-pushed the export-D70355802 branch from 388f54d to 5b2587a Compare

March 6, 2025 20:02

facebook-github-bot merged commit 352416e into pytorch:main

48 of 52 checks passed

# for free to join this conversation on GitHub. Already have an account? # to comment

Labels

CLA Signed fb-exported topic: not user facing