mllm

Star

Here are 123 public repositories matching this topic...

microsoft / unilm

Star

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated Mar 4, 2025
Python

X-PLUG / MobileAgent

Star

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

android agent harmony ios app gui automation mobile copilot multimodal mobile-agents mllm multimodal-large-language-models gpt4v multimodal-agent

Updated Mar 17, 2025
Python

NExT-GPT / NExT-GPT

Star

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

multimodal gpt-4 foundation-models visual-language-learning large-language-models llm chatgpt instruction-tuning mllm multi-modal-chatgpt

Updated Nov 3, 2024
Python

ant-research / MagicQuill

Star

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

image-editing gradio aigc mllm

Updated Feb 27, 2025
Python

atfortes / Awesome-LLM-Reasoning

Star

Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

awesome prompt gpt papers language-models strawberry reasoning cot multimodal in-context-learning prompt-engineering chain-of-thought chatgpt mllm deepseek gpt-4o openai-o1 deepseek-r1

Updated Mar 19, 2025

InternLM / InternLM-XComposer

Star

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated Jan 22, 2025
Python

X-PLUG / mPLUG-DocOwl

Star

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

multimodal table-understanding document-understanding mllm multimodal-large-language-models chart-understanding

Updated Dec 24, 2024
Python

cambrian-mllm / cambrian

Star

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

computer-vision chatbot representation-learning clip dino large-language-models llms instruction-tuning mllm multimodal-large-language-models

Updated Oct 30, 2024
Python

simular-ai / Agent-S

Star

[ICLR 2025] Agent S: an open agentic framework that uses computers like a human

memory planning ai-agents computer-automation mllm retrieval-augmented-generation in-context-reinforcement-learning agent-computer-interface gui-agents experience-augmented-hierarchical-planning

Updated Mar 18, 2025
Python

BAAI-DCAI / Bunny

Star

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Nov 18, 2024
Python

magic-research / Sa2VA

Star

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

computer-vision mllm

Updated Mar 19, 2025
Python

CircleRadon / Osprey

Star

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

sam mllm visual-instruction-tuning pixel-understanding

Updated Feb 27, 2025
Python

BradyFU / Woodpecker

Star

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models

Updated Dec 23, 2024
Python

NVlabs / EAGLE

Star

Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

demo eagle llama lmm nvdia huggingface gpt4 large-language-models llm mllm llava lvlm llama3

Updated Jan 28, 2025
Python

SkyworkAI / Skywork-R1V

Star

Pioneering Multimodal Reasoning with CoT

llm mllm deepseek-r1

Updated Mar 21, 2025
Python

taco-group / OpenEMMA

Star

OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

machine-learning networking algorithms transportation artificial-intelligence perception autonomous-car autonomous-driving autonomous-vehicles emma autonomy generative-ai mllm open-emma large-lang

Updated Feb 19, 2025
Python

manycore-research / SpatialLM

Star

SpatialLM: Large Language Model for Spatial Understanding

point-clouds scene-understanding mllm spatial-intelligence

Updated Mar 17, 2025
Python

FoundationVision / Groma

Star

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

llama multimodal grounding foundation-models large-language-models llm mllm vision-language-model llama2

Updated Jun 7, 2024
Python

SkyworkAI / Vitron

Star

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

segmentation mllm multimodal-large-language-models

Updated Oct 20, 2024
Python

gokayfem / ComfyUI_VLM_nodes

Star

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Feb 13, 2025
Python

Improve this page

Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mllm

Here are 123 public repositories matching this topic...

microsoft / unilm

X-PLUG / MobileAgent

NExT-GPT / NExT-GPT

ant-research / MagicQuill

atfortes / Awesome-LLM-Reasoning

InternLM / InternLM-XComposer

X-PLUG / mPLUG-DocOwl

cambrian-mllm / cambrian

simular-ai / Agent-S

BAAI-DCAI / Bunny

magic-research / Sa2VA

CircleRadon / Osprey

BradyFU / Woodpecker

NVlabs / EAGLE

SkyworkAI / Skywork-R1V

taco-group / OpenEMMA

manycore-research / SpatialLM

FoundationVision / Groma

SkyworkAI / Vitron

gokayfem / ComfyUI_VLM_nodes

Improve this page

Add this topic to your repo