Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Increase batch size results in a non-linear increase in computation time #616

Open
Wonder1905 opened this issue Dec 22, 2024 · 2 comments
Open

Comments

@Wonder1905
Copy link

HI, I noticed that increasing the batch size train or inf results in a non-linear increase in computation time (we would except that linear increase would be an upper bound in some sense).
Ive saw it on my own environment, built another one and in the end tried also in the colab, here is the colab code:

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
import numpy as np
import time
import torch
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-2B-Instruct", torch_dtype="float16", device_map="auto"
)

processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", torch_dtype="float16")
messages1 = [
    {
        "role": "user",
        "content": [
            {"type": "image", "resized_height": 256, "resized_width": 256, "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"},
            {"type": "image", "resized_height": 256, "resized_width": 256, "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"},
            {"type": "text", "text": "What are the common elements in these pictures?"},
        ],
    }
]


#messages = [messages1,messages1,messages1,messages1]
#messages = [messages1,messages1,messages1]
#messages = [messages1,messages1]
messages = [messages1]
# Preparation for batch inference
texts = [
    processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
    for msg in messages
]
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=texts,
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

avg_list = []
for attempt in range(100):
    with torch.no_grad():
        torch.cuda.synchronize()
        start = time.perf_counter()  

        generated_ids = model(**inputs)
        torch.cuda.synchronize()
        end = time.perf_counter()
        avg_list.append(end-start)
        print(f"dotproduct time: {end - start}")
print("Avg of 100:",np.mean(avg_list))

Since colab can be tricky in allocating his resources, I did 10 run of 100 iterations and removed outliers the results were:
Batch=4: 1.4s
Batch=3: 0.88s
Batch=2: 0.51s
Batch=1: 0.19s
We can see the non linear increase, further when the size of the image increase it is much sharper.
After debugging, it happens in the visual:
self.visual = Qwen2VisionTransformerPretrainedModel._from_config(config.vision_config)

IN the scaled_dot_product:
attn_output = F.scaled_dot_product_attention(q, k, v, attention_mask, dropout_p=0.0)

And probably becuase you are treating the batch of images as a sequence and try to handle it with attention mask, but the seq length is biggest pain in transformers why is this the implementation?

Am I missing something?

@zwplus
Copy link

zwplus commented Jan 8, 2025

I encountered the same issue and noticed that the VisionTransformer in Qwen2-VL treats batched images as a single sequence. This leads to significant increases in memory usage during training, especially with larger batch sizes and high-resolution images. To address this, I modified the VisionTransformer in Qwen2-VL to use batch-based processing logic. Everything seems to work normally now. I also tested it using your script, and the inference time looks much more reasonable. I hope this solution helps!

@Wonder1905
Copy link
Author

Any chance sharing the modified code?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants