Gradio App with Multi-GPU #59

tpc2233 · 2024-10-13T01:01:20Z

User can set 2 or 4 GPUs in the UI
App assume the models are located in the ./pyramid_flow_model directory within the project.
(use regular App to easy download)

Due to Gradio, decided to do external engine to handle the heavy computation.
Make sure to set permission: chmod +x app_multigpu_engine.sh

Benchmarking the 768p Model on GPUs with 24GB VRAM:

Prompt:
A sloth with pink sunglasses lays on a donut float in a pool. The sloth is holding a tropical drink. The world is tropical. The sunlight casts a shadow

Results:
4x NVIDIA RTX 4090 (24GB VRAM) - 768p Model:

Duration 2: 51 seconds
Duration 4: 56 seconds
Duration 8: 74 seconds
Duration 16: 191 seconds
Duration 20: Out of Memory (OOM)

User can set 2 or 4 GPUs in the UI App assume the models are located in the ./pyramid_flow_model directory within the project. (use regular App to easy download) Due to Gradio's single-threaded nature, decided to do external engine (app_multigpu_engine.sh and app_multigpu_engine.py) to handle the heavy computation. make sure to set permission: chmod +x app_multigpu_engine.sh Benchmarking the 768p Model on GPUs with 24GB VRAM: Prompt: A sloth with pink sunglasses lays on a donut float in a pool. The sloth is holding a tropical drink. The world is tropical. The sunlight casts a shadow Results: 4x NVIDIA RTX 4090 (24GB VRAM) - 768p Model: Duration 2: 51 seconds Duration 4: 56 seconds Duration 8: 74 seconds Duration 16: 191 seconds Duration 20: Out of Memory (OOM)

feifeiobama · 2024-10-13T01:44:09Z

Thanks for the multi-GPU Gradio app. Just a follow-up question: should we merge app_multigpu_engine.py and inference_multigpu.py, and put the bash script to the scripts/ folder?

tpc2233 · 2024-10-13T03:37:56Z

Thanks for the multi-GPU Gradio app. Just a follow-up question: should we merge app_multigpu_engine.py and inference_multigpu.py, and put the bash script to the scripts/ folder?

makes totally sense, updated scripts, request merge is here:
#77

pharrowboy · 2024-10-14T02:07:46Z

its worth noting after moving the DIT to cuda, you can add optimum quanto to the script, and you wont go OOM with longer generation, i'm currently using optimum quanto to run the 768p model on single 24GB 3090, my other 3 3090s are running a full flux model balanced across them, to create a text to flux to pyramid flow gradio app.

from optimum.quanto import freeze, qfloat8, quantize
model.dit.to("cuda:0")
quantize(model.dit, weights=qfloat8)
freeze(model.dit)

feifeiobama merged commit ce12046 into jy0205:main Oct 13, 2024

feifeiobama mentioned this pull request Oct 20, 2024

OOM on 2x4090? #126

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradio App with Multi-GPU #59

Gradio App with Multi-GPU #59

tpc2233 commented Oct 13, 2024

feifeiobama commented Oct 13, 2024 •

edited

Loading

tpc2233 commented Oct 13, 2024

pharrowboy commented Oct 14, 2024 •

edited

Loading

Gradio App with Multi-GPU #59

Gradio App with Multi-GPU #59

Conversation

tpc2233 commented Oct 13, 2024

feifeiobama commented Oct 13, 2024 • edited Loading

tpc2233 commented Oct 13, 2024

pharrowboy commented Oct 14, 2024 • edited Loading

feifeiobama commented Oct 13, 2024 •

edited

Loading

pharrowboy commented Oct 14, 2024 •

edited

Loading