Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[BUG: Error during training #85

Open
Chasapas opened this issue Jul 25, 2024 · 1 comment
Open

[BUG: Error during training #85

Chasapas opened this issue Jul 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Chasapas
Copy link

Python Version

Python 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]

Error:

TypeError: DataArgs.__init__() got an unexpected keyword argument 'no_eval'

RuntimeError: Couldn't instantiate class <class 'finetune.data.args.DataArgs'> using init args dict_keys(['data', 'no_eval']): DataArgs.__init__() got an unexpected keyword argument 'no_eval'
[2024-07-25 08:25:20,114] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1444114) of binary: /home/path/venv/bin/python

Pip Freeze

absl-py==2.1.0
annotated-types==0.7.0
attrs==23.2.0
certifi==2024.7.4
charset-normalizer==3.3.2
docstring_parser==0.16
filelock==3.15.4
fire==0.6.0
fsspec==2024.6.1
grpcio==1.65.1
idna==3.7
Jinja2==3.1.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
Markdown==3.6
MarkupSafe==2.1.5
mistral_common==1.3.3
mpmath==1.3.0
networkx==3.3
numpy==1.25.0
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.5.82
nvidia-nvtx-cu12==12.1.105
protobuf==4.25.3
pydantic==2.6.1
pydantic_core==2.16.2
PyYAML==6.0.1
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rpds-py==0.19.0
safetensors==0.4.3
sentencepiece==0.2.0
simple_parsing==0.1.5
six==1.16.0
sympy==1.13.1
tensorboard==2.17.0
tensorboard-data-server==0.7.2
termcolor==2.4.0
tiktoken==0.7.0
torch==2.2.0
tqdm==4.66.4
triton==2.2.0
typing_extensions==4.12.2
urllib3==2.2.2
Werkzeug==3.0.3
xformers==0.0.24

Reproduction Steps

torchrun --nproc-per-node 2 --master_port $RANDOM -m train mistral-7b-v0.3/7B.yaml

Expected Behavior

Normal training as described

Additional Context

data:
data: "path/mistral-finetune/mistral-7b-v0.3/output.jsonl" # Path to your general training
no_eval: True

Suggested Solutions

No response

@Chasapas Chasapas added the bug Something isn't working label Jul 25, 2024
@one-and-only
Copy link

Did you remove the no_eval declaration that is already included in the #other section of the examle YAML? This seemed to solve that error for me.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants