Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

train.py: error: unrecognized arguments: --local-rank=0 #134

Open
davidvct opened this issue Jan 23, 2024 · 6 comments
Open

train.py: error: unrecognized arguments: --local-rank=0 #134

davidvct opened this issue Jan 23, 2024 · 6 comments

Comments

@davidvct
Copy link

davidvct commented Jan 23, 2024

Encounter this error when trying to train GoPro datasets:
python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 train.py -opt options/train/GoPro/NAFNet-width32.yml --launcher pytorch

I searched the train.py, there is no --local-rank=0.

How to fix?

@txy00001
Copy link

在train里添加
image

@sentinel8b
Copy link

sentinel8b commented Apr 24, 2024

Change

parser.add_argument('--local_rank', type=int, default=0)

To

parser.add_argument('--local-rank', type=int, default=0)

And I didn't add

os.environ['RANK'] = str(0)

@rp7sv
Copy link

rp7sv commented May 10, 2024

Change

parser.add_argument('--local_rank', type=int, default=0)

To

parser.add_argument('--local-rank', type=int, default=0)

And I didn't add

os.environ['RANK'] = str(0)

thanks,when i try to use torchrun it reported:”can not open python:no such file“,when i follow your change,it works!

@tobymuller233
Copy link

Change

parser.add_argument('--local_rank', type=int, default=0)

To

parser.add_argument('--local-rank', type=int, default=0)

And I didn't add

os.environ['RANK'] = str(0)

It seems that "local-rank" with a "-" in the middle instead of "_" doesn't follow the naming rule in Python.
I'm trying to debug a multi GPU program in vscode and config launch.json as followed:
{ "version": "0.2.0", "configurations": [ { "name": "Debug Distributed Training (GPU 0)", "type": "debugpy", "request": "launch", "program": "${workspaceFolder}/train.py", "console": "integratedTerminal", "args": [ "~/stu_motion/scrfd/configs/scrfd/scrfd_1g.py", "--launcher", "pytorch", ], "env": { "PYTHONPATH": "${workspaceFolder}/..:${env:PYTHONPATH}", "MASTER_ADDR": "127.0.0.1", "MASTER_PORT": "29500", "WORLD_SIZE": "2", "RANK": "0" }, "pythonArgs": [ "-m", "torch.distributed.launch", "--nproc_per_node=2", "--master_port=29500" ] }, { "name": "Debug Distributed Training (GPU 1)", "type": "debugpy", "request": "launch", "program": "${workspaceFolder}/train.py", "console": "integratedTerminal", "args": [ "~/stu_motion/scrfd/configs/scrfd/scrfd_1g.py", "--launcher", "pytorch", ], "env": { "PYTHONPATH": "${workspaceFolder}/..:${env:PYTHONPATH}", "MASTER_ADDR": "127.0.0.1", "MASTER_PORT": "29500", "WORLD_SIZE": "2", "RANK": "1" }, "pythonArgs": [ "-m", "torch.distributed.launch", "--nproc_per_node=2", "--master_port=29500" ] } ] }
I have no idea about whether it's true or not, but it turns out that the program failed to run correctly.

@dr-smgad
Copy link

Hi,

I had this issue when I tried to debug from VSCode by setting the "module":"torch.distributed.launch" in my launch.json, and I was getting this unrecognized argument --local-rank=0 error as my Python file didn't expect it (not part of the args). It turned out that you need to set "--use-env" as the first arg in launch.json "args":["--use-env", <continue other args here>] and torch.distributed.launch will stop automatically adding this argument.

I hope it helps

@Yolo1-gguo
Copy link

usage: train.py [-h] -opt OPT [--launcher {none,pytorch,slurm}] [--local-rank LOCAL_RANK]
[--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
train.py: error: unrecognized arguments: --local_rank=0 pytorch 我应该怎么解决?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants