Skip to content

VLM Pipeline (Intern,LLava) #256

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

qcdipankar
Copy link
Contributor

Added Generic Framework to onboard and run VLMs in QEff

return generate_func(**kwargs)

def generate_inputs_intern(self, **kwargs):
bs: int = constants.ONNX_EXPORT_EXAMPLE_BATCH_SIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this inside the modeling file.

return inputs, dynamic_axes, output_names

def generate_inputs_llava(self, **kwargs):
bs: int = constants.ONNX_EXPORT_EXAMPLE_BATCH_SIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this inside the modeling file modelling_llava.py

# )
# num_logits_to_keep = num_speculative_tokens + 1
# if prefill_seq_len < num_logits_to_keep:
# raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented lines.

generation_len = self.ctx_len - input_len.max() # in standalone this is tensor
assert generation_len > 0, "generation length should be greater than zero"
generated_ids = np.full((batch_size, generation_len + 1), self.processor.tokenizer.pad_token_id)
# inputs["input_ids"]=torch.nn.functional.pad(inputs["input_ids"],(0,self.seq_len_constant-inputs["input_ids"].size(1)),"constant",self.pad_token_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

PROMPT_LEN = 8
INPUT_STR = ["My name is"]
GB = 2**30
MAX_QPC_LIMIT = 30
MAX_RETRIES = 5 # This constant will be used set the maximum number of retry attempts for downloading a model using huggingface_hub snapshot_download
NUM_SPECULATIVE_TOKENS = 2
CTX_LEN_VLM_LLAVA = 1280
IMG_SIZE = 336
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using these at the time of export to define shapes?



# if __name__ == "__main__":
# # model_name = "OpenGVLab/InternVL2_5-1B"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented parts.

@@ -251,6 +252,7 @@ def _compile(
if num_speculative_tokens:
compile_hash.update(to_hashable({"num_speculative_tokens": num_speculative_tokens}))

# import ipdb; ipdb.set_trace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these lines.

if hasattr(module, "__qeff_init__"):
module.__qeff_init__()
transformed = True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine both if conditions?

input_ids_size = input_ids.shape[1]
# attention_mask = inputs["attention_mask"]
inputs["input_ids"] = torch.nn.functional.pad(
inputs["input_ids"], (0, 3072 - input_ids_size), "constant", self.processor.tokenizer.pad_token_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid hardcoded value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this value generic, and fetch from qpc session - prefill_seq_len. For whichever value it was compiled for.

breakpoint()
self.model.config.use_cache = True
self.processor = processor
self.num_layers = model.config.text_config.num_hidden_layers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make fetching num_layers generic. Also the padding shape. Please refer llava PR, use a similar function, which fetches based on model architecture.

@quic-amitraj
Copy link
Contributor

Already addressed in #267.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants