You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A factory class for creating QEFFAutoModelForImageTextToText instances with for single and Dual QPC approach
1174
+
The QEFFAutoModelForImageTextToText class is used to work with multimodal language models from the HuggingFace hub.
1175
+
While you can initialize the class directly, it's best to use the ``from_pretrained`` method for this purpose. This class supports both single and dual QPC approaches.
1174
1176
Attributes:
1175
1177
_hf_auto_class (class): The Hugging Face AutoModel class for ImageTextToText models.
1178
+
1179
+
``Mandatory`` Args:
1180
+
:pretrained_model_name_or_path (str): Model card name from HuggingFace or local path to model directory.
1181
+
1182
+
``Optional`` Args:
1183
+
:kv_offload (bool): Flag to toggle between single and dual QPC approaches. If set to False, the Single QPC approach will be used; otherwise, the dual QPC approach will be applied. Defaults to True.
1184
+
1185
+
.. code-block:: python
1186
+
import requests
1187
+
from PIL import Image
1188
+
from transformers import AutoProcessor, TextStreamer
1189
+
1190
+
from QEfficient import QEFFAutoModelForImageTextToText
@@ -1219,7 +1280,6 @@ class QEFFAutoModelForCausalLM(QEFFBaseModel):
1219
1280
:model (nn.Module): PyTorch model
1220
1281
:continuous_batching (bool): Weather this model will be used for continuous batching in future. If this is not set True here, the model can not be exported/compiled for continuous batching later.
1221
1282
:is_tlm (bool): Whether this is a Speculative Decoding Target Language Model. If set to True, `num_logits_to_keep` input array will have to be fed to control the number of returned logits during prefill/decode.
1222
-
:enable_qnn (bool): Enables QNN Compilation path for the model.
This method serves as the easiest entry point into using QEfficient. The interface is designed to be similar to transformers.AutoModelForCausalLM.
@@ -1314,7 +1365,6 @@ def from_pretrained(
1314
1365
:pretrained_name_or_path (str): Model card name from HuggingFace or local path to model directory.
1315
1366
:continuous_batching (bool): Whether this model will be used for continuous batching in future. If this is not set True here, the model can not be exported/compiled for continuous batching later.
1316
1367
:is_tlm (bool): Whether this is a Speculative Decoding Target Language Model. If set to True, `num_logits_to_keep` input array will have to be fed to control the number of returned logits during prefill/decode.
1317
-
:enable_qnn (bool): Enables QNN Compilation path for the model.
1318
1368
:args, kwargs: Additional arguments to pass to transformers.AutoModelForCausalLM.
num_cores: int=16, # FIXME: Make this mandatory arg
1754
1805
mxfp6_matmul: bool=False,
1806
+
mxint8_kv_cache: bool=False,
1807
+
num_speculative_tokens: Optional[int] =None,
1808
+
enable_qnn: bool=False,
1809
+
qnn_config: Optional[str] =None,
1755
1810
**compiler_options,
1756
1811
) ->str:
1757
1812
"""
@@ -1762,19 +1817,41 @@ def compile(
1762
1817
``Optional`` Args:
1763
1818
:onnx_path (str, optional): Path to pre-exported onnx model.
1764
1819
:compile_dir (str, optional): Path for saving the qpc generated.
1765
-
:seq_len (int, optional): The length of the prompt should be less that ``seq_len``. ``Defaults to 32``.
1820
+
:encoder_ctx_len (int, optional): The maximum length of context for encoder, based on the AutoProcessor output. ``Defaults to checking config, if None in config then 1500``
1821
+
:ctx_len (int, optional): The maximum length of context to keep for decoding. ``Defaults to 150``.
1766
1822
:batch_size (int, optional): Batch size. ``Defaults to 1``.
1767
1823
:num_devices (int): Number of devices the model needs to be compiled for. Defaults to 1.
1768
1824
:num_cores (int): Number of cores used to compile the model.
1769
1825
:mxfp6_matmul (bool, optional): Whether to use ``mxfp6`` compression for weights. ``Defaults to False``.
1770
1826
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
1771
-
:allow_mxint8_mdp_io (bool, optional): Allows MXINT8 compression of MDP IO traffic. ``Defaults to False.``
1827
+
1828
+
Other args are not yet implemented for AutoModelForSpeechSeq2Seq
0 commit comments