Skip to content

Commit

Permalink
Merge pull request #4 from lartpang/dev
Browse files Browse the repository at this point in the history
Fixed bug and update examples.
  • Loading branch information
lartpang authored Apr 20, 2024
2 parents 6a6f95a + 3c4e189 commit 60a6d2a
Show file tree
Hide file tree
Showing 6 changed files with 201 additions and 193 deletions.
67 changes: 51 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,46 +14,81 @@ Putting the machine into sleep is a disrespect for time.

```shell
$ python run_it.py --help
usage: run_it.py [-h] [--verbose] [--gpu-pool GPU_POOL [GPU_POOL ...]] [--max-workers MAX_WORKERS] --cmd-pool CMD_POOL
[--max-used-ratio MAX_USED_RATIO]
usage: run_it.py [-h] [--gpu-pool GPU_POOL [GPU_POOL ...]] [--max-workers MAX_WORKERS] --cmd-pool CMD_POOL
[--interval-for-waiting-gpu INTERVAL_FOR_WAITING_GPU] [--interval-for-loop INTERVAL_FOR_LOOP]

optional arguments:
-h, --help show this help message and exit
--verbose Whether to print the output of the subprocess.
--gpu-pool GPU_POOL [GPU_POOL ...]
The pool containing all ids of your gpu devices.
--max-workers MAX_WORKERS
The max number of the workers.
--cmd-pool CMD_POOL The text file containing all your commands. It need to contain the launcher.
--max-used-ratio MAX_USED_RATIO
The max used ratio of the gpu.
--cmd-pool CMD_POOL The path of the yaml containing all cmds.
--interval-for-waiting-gpu INTERVAL_FOR_WAITING_GPU
In seconds, the interval for waiting for a GPU to be available.
--interval-for-loop INTERVAL_FOR_LOOP
In seconds, the interval for looping.
```
## demo
```shell
$ python run_it.py --verbose --cmd-pool ./examples/config.txt # with the default `gpu-pool` and `max-workers`
$ python run_it.py --verbose --gpu-pool 0 1 --max-workers 2 --cmd-pool ./examples/config.txt
$ python run_it.py --cmd-pool ./examples/config.yaml # with the default `gpu-pool` and `max-workers`
$ python run_it.py --gpu-pool 0 2 3 --max-workers 3 --cmd-pool .\examples\config.yaml
```
<details>
<summary>
./examples/config.txt
./examples/config.yaml
</summary>
```shell
$ cat ./examples/config.txt
python -m pip list
python --version
python ./examples/demo.py
python ./examples/demo.py
python ./examples/demo.py
```yaml
- name: job1
command: "python ./examples/demo.py --value 1"
num_gpus: 1
- name: job2
command: "python ./examples/demo.py --value 2"
num_gpus: 1
- name: job3
command: "python ./examples/demo.py --value 3"
num_gpus: 1
- name: job4
command: "python ./examples/demo.py --value 4"
num_gpus: 1
- name: job5
command: "python ./examples/demo.py --value 5"
num_gpus: 2
- { name: job6, command: "python ./examples/demo.py --value 5", num_gpus: 2 }
- { name: job7, command: "python ./examples/demo.py --value 5", num_gpus: 2 }
```
</details>
```mermaid
graph TD
A[Start] --> B[Read Configuration and Command Pool]
B --> C[Initialize Shared Resources]
C --> |Maximum number of requirements met| D[Loop Until All Jobs Done]
D --> E[Check Available GPUs]
E -->|Enough GPUs| F[Run Job in Separate Process]
E -->|Not Enough GPUs| G[Wait and Retry]
F --> H[Job Completes]
F --> I[Job Fails]
H --> J[Update Job Status and Return GPUs]
I --> J
G --> D
J -->|All Jobs Done| K[End]
C -->|Maximum number of requirements not met| L[Terminate Workers]
L --> M[Shutdown Manager and Join Pool]
M --> K
```
## Thanks
- [@BitCalSaul](https://github.com/BitCalSaul): Thanks for the positive feedbacks!
- <https://github.com/lartpang/RunIt/issues/3>
- <https://github.com/lartpang/RunIt/issues/2>
- <https://github.com/lartpang/RunIt/issues/1>
- https://www.jb51.net/article/142787.htm
- https://docs.python.org/zh-cn/3/library/subprocess.html
- https://stackoverflow.com/a/23616229
Expand Down
5 changes: 0 additions & 5 deletions examples/config.txt

This file was deleted.

23 changes: 23 additions & 0 deletions examples/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
- name: job1
command: "python ./examples/demo.py --value 1"
num_gpus: 1
- name: job02
command: "python ./examples/demo.py --value 1 --exception"
num_gpus: 1
- name: job03
command: "python ./examples/demo.py --value 1 --exception"
num_gpus: 1
- name: job2
command: "python ./examples/demo.py --value 2"
num_gpus: 1
- name: job3
command: "python ./examples/demo.py --value 3"
num_gpus: 1
- name: job4
command: "python ./examples/demo.py --value 4"
num_gpus: 1
- name: job5
command: "python ./examples/demo.py --value 5"
num_gpus: 2
- { name: job6, command: "python ./examples/demo.py --value 5", num_gpus: 2 }
- { name: job7, command: "python ./examples/demo.py --value 5", num_gpus: 2 }
19 changes: 18 additions & 1 deletion examples/demo.py
Original file line number Diff line number Diff line change
@@ -1 +1,18 @@
print('Hello!')
import argparse
import os
import time

parser = argparse.ArgumentParser()
parser.add_argument("--value", type=int, default=0)
parser.add_argument("--exception", action="store_true", default=False)
args = parser.parse_args()


GPU_IDS = os.environ["CUDA_VISIBLE_DEVICES"]
print(f"[GPUs: {GPU_IDS}] Start {args.value}")

if args.exception:
raise Exception(f"[GPUs: {GPU_IDS}] Internal Exception!")

time.sleep(args.value * 2)
print(f"[GPUs: {GPU_IDS}] End {args.value}")
3 changes: 1 addition & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Automatically generated by https://github.com/damnever/pigar.

# RunIt/run_it.py: 12
nvidia_ml_py3 == 7.352.0
PyYAML==6.0
Loading

0 comments on commit 60a6d2a

Please # to comment.