You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I just trained my model locally and checked the results of my trained models against the ones on the README. I found that they are different. I believe this is due to the embeddings of the previously mentioned tokens change every time the model is instantiated. For instance, trying with the same phrase, if I instantiated the model and predicted, the output would be different from the next time I instantiated and predicted the same phrase.
I believe in the __init__ method of the infer_from_trained class, with the method resize_token_embeddings()at line 83 of the infer.py file, the embeddings are being extended to have the 4 extra tokens, but the embeddings are being initialized randomly and this causes the results to vary.
Am I understanding it correctly? Or am I mistaken? Any help would be appreciated.
The text was updated successfully, but these errors were encountered:
I have never encountered this issue. After resize_token_embeddings(), the trained model weights will be loaded with load_state which loads the trained embeddings, so there is no reason for them to change every load.
Hi, I just trained my model locally and checked the results of my trained models against the ones on the
README
. I found that they are different. I believe this is due to the embeddings of the previously mentioned tokens change every time the model is instantiated. For instance, trying with the same phrase, if I instantiated the model and predicted, the output would be different from the next time I instantiated and predicted the same phrase.I believe in the
__init__
method of theinfer_from_trained
class, with the methodresize_token_embeddings()
at line 83 of theinfer.py
file, the embeddings are being extended to have the 4 extra tokens, but the embeddings are being initialized randomly and this causes the results to vary.Am I understanding it correctly? Or am I mistaken? Any help would be appreciated.
The text was updated successfully, but these errors were encountered: