Architecture | Models | Example HuggingFace Models |
---|---|---|
ChatGLMModel |
ChatGLM | |
GemmaForCausalLM |
Gemma | |
GPTNeoXForCausalLM |
Dolly | |
RedPajama | ||
LlamaForCausalLM |
Llama 3 | |
Llama 2 | ||
OpenLLaMA | ||
TinyLlama | ||
MistralForCausalLM |
Mistral | |
Notus | ||
Zephyr | ||
PhiForCausalLM |
Phi | |
QWenLMHeadModel |
Qwen |
Note
LoRA adapters are supported.
The pipeline can work with other similar topologies produced by optimum-intel
with the same model signature. The model is required to have the following inputs after the conversion:
input_ids
contains the tokens.attention_mask
is filled with1
.beam_idx
selects beams.position_ids
(optional) encodes a position of currently generating token in the sequence and a singlelogits
output.
Note
Models should belong to the same family and have the same tokenizers.
Architecture | Models | LoRA support | Example HuggingFace Models | Notes |
---|---|---|---|---|
InternVL2 |
InternVL2 | Not supported | ||
LLaVA |
LLaVA-v1.5 | Not supported | ||
LLaVA-NeXT |
LLaVa-v1.6 | Not supported | ||
MiniCPMV |
MiniCPM-V-2_6 | Not supported | ||
Phi3VForCausalLM |
phi3_v | Not supported |
eos_token_id with the one from a tokenizer: generation_config.set_eos_token_id(pipe.get_tokenizer().get_eos_token_id()) . |
|
Qwen2-VL |
Qwen2-VL | Not supported |
Architecture | Models | LoRA support | Example HuggingFace Models |
---|---|---|---|
WhisperForConditionalGeneration |
Whisper | Not supported | |
Distil-Whisper | Not supported |
If https://huggingface.co/ is down, the conversion step won't be able to download the models.