Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

refactor: Un-nest nn.Sequential in ResidualBlock.hidden_layer #133

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TeddyHuang-00
Copy link
Contributor

Thank you for your brilliant work! I saw the official PyTorch implementation was added a few days ago, and I wrote a script for converting the PAX model checkpoints into PyTorch ones. Hope this is helpful for you as well as others fellow researchers!

This PR consists of the following changes:

  • Add a conversion script for converting PAX model checkpoints to PyTorch state dicts
  • Update the README for instruction on how to use the conversion script
  • Change the nested nn.Sequential in ResidualBlock.hidden_layer to match the layout of other child nodes ResidualBlock.output_layer and ResidualBlock.residual_layer

The original hidden_layer in ResidualBlock consists of a nn.Linear and a nn.SiLU. Sperating them will affect nothing, but make the layer structure consistent with other child nodes output_layer and residual_layer
Copy link

google-cla bot commented Aug 29, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@TeddyHuang-00
Copy link
Contributor Author

A side note here, I have no idea on how or whether the PyTorch model will also be included in the published package, or is it going to be a optional feature like timesfm[pytorch]. I just put the conversion script there because there seems no better place for it.

@rajatsen91
Copy link
Collaborator

Hi @TeddyHuang-00, Thanks and nice work. I am ok to merge the changes to the residual block. For the convert weights, wehave a version of that already. But the reason we have not checked that in is because we are contemplating directly uploading the pytorch weights to huggingface. If you can split the pull request, I can check in the residual block change. For the convert_weights we need to think a little bit more.

@TeddyHuang-00
Copy link
Contributor Author

Hi @rajatsen91, glad to hear that you have a working solution already. I updated this PR, and please let me know if I can help with the PyTorch version. I am glad to help!

@TeddyHuang-00 TeddyHuang-00 changed the title feat: convert model to PyTorch refactor: Un-nest nn.Sequential in ResidualBlock.hidden_layer Aug 29, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants