A simple text generator for ComfyUI utilizing ExLlamaV2.
Navigate to the root ComfyUI directory and clone the repository to custom_nodes
:
git clone https://github.com/Zuellni/ComfyUI-ExLlama-Nodes custom_nodes/ComfyUI-ExLlamaV2-Nodes
Install the requirements depending on your system:
pip install -r custom_nodes/ComfyUI-ExLlamaV2-Nodes/requirements-VERSION.txt
requirements-no-wheels.txt | ExLlamaV2 and FlashAttention, no wheels. |
requirements-torch-21.txt | Windows wheels for Python 3.11, Torch 2.1, CUDA 12.1. |
requirements-torch-22.txt | Windows wheels for Python 3.11, Torch 2.2, CUDA 12.1. |
Check what version you need with:
python -c "import platform; import torch; print(f'Python {platform.python_version()}, Torch {torch.__version__}, CUDA {torch.version.cuda}')"
Caution
If none of the wheels work for you or there are any ExLlamaV2-related errors while the nodes are loading, try to install it manually following the official instructions.
Keep in mind that wheels >= 0.0.13
require Torch 2.2.
Only EXL2 and 4-bit GPTQ models are supported. You can find a lot of them on Hugging Face. Refer to the model card in each repository for details about quant differences and instruction formats.
To use a model with the nodes, you should clone its repository with git or manually download all the files and place them in models/llm
.
For example, if you'd like to download Mistral-7B, use the following command:
git clone https://huggingface.co/LoneStriker/Mistral-7B-Instruct-v0.2-5.0bpw-h6-exl2-2 models/llm/mistral-7b-exl2-b5
Tip
You can add your own llm
path to the extra_model_paths.yaml file and place the models there instead.
Loader | Loads models from the llm directory. |
|
gpu_split | Comma-separated VRAM in GB per GPU, eg 6.9, 8 . |
|
cache_8bit | Lower VRAM usage but also lower speed. | |
max_seq_len | Max context, higher number equals higher VRAM usage. 0 will default to config. |
|
Generator | Generates text based on the given prompt. Refer to text-generation-webui for parameters. | |
unload | Unloads the model after each generation. | |
single_line | Stops the generation on newline. | |
max_tokens | Max new tokens, 0 will use available context. |
|
Preview | Displays generated text in the UI. | |
Replace | Replaces variable names enclosed in brackets, eg [a] , with their values. |
The example workflow is embedded in the image below and can be opened in ComfyUI.