Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

register_model_architecture not working in version 0.10.1 #3097

Closed
voidmagic opened this issue Jan 5, 2021 · 3 comments
Closed

register_model_architecture not working in version 0.10.1 #3097

voidmagic opened this issue Jan 5, 2021 · 3 comments
Assignees

Comments

@voidmagic
Copy link
Contributor

🐛 Bug

Using register_model_architecture to register new hyper parameter set failed. The hyper parameter is overrided by default settings.

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run cmd 'fairseq-train --task language_modeling data-bin --arch transformer_lm_big --batch-size 1 --optimizer adam'
  2. See error
    There are 6 layers in the model, which should be 12 as defined in transformer_lm_big . https://github.com/pytorch/fairseq/blob/v0.10.1/fairseq/models/transformer_lm.py#L311

Code sample

Expected behavior

Environment

  • fairseq Version (e.g., 1.0 or master): 0.10.1
  • PyTorch Version (e.g., 1.0)
  • OS (e.g., Linux):
  • How you installed fairseq (pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

@myleott
Copy link

myleott commented Jan 5, 2021

Good catch. It's only broken for TransformerLanguageModel in 0.10.1, because that model was migrated to Hydra (dataclass-based config) before the migration was completed.

The fix is to patch this commit: b7d8b9d. I'll release 0.10.2 with the fix shortly.

@myleott myleott self-assigned this Jan 5, 2021
@myleott myleott added the 0.10.2 label Jan 5, 2021
@myleott
Copy link

myleott commented Jan 5, 2021

Fixed in 0.10.2

@myleott myleott closed this as completed Jan 5, 2021
@xuuHuang
Copy link

The hyper parameters of transformer model are still overrided by default settings in TransformerConfig in 0.10.2.
If I set arch=transformer_tiny, there 6 layers in the model, which should be 2.
I think that's because add_args() in TransformerModelBase sets the default hyper parameters.

def add_args(parser):
    gen_parser_from_dataclass(
        parser, TransformerConfig(), delete_default=False, with_prefix=""
    )

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants