PaLI-GEMMA Model Reproduction

A reproduction of Google's PaLI-GEMMA (an open vision-language model), mainly for self-learning purposes. This project references Google's official code and the associated research paper. It includes detailed notes on tensor shape transformations within the network for deeper understanding.

Key Points

Focus: Documenting tensor shape transformations at each layer for in-depth learning of the model structure.
Goal: To provide a clear, readable version of PaLI-GEMMA for self-study.

Setup

Download Model Weights: Download the model weights from PaLI-GEMMA 3B on Hugging Face

Clone the repository and install dependencies:

git clone git@github.com:CazeroZ/PaliGemma_repro.git
cd PaliGemma_repro
pip install -r requirements.txt

Configure and Run the Inference Script:
- Open launch_inference.sh and modify the following variables as needed:
  - MODEL_PATH: Set this to the directory where the downloaded model weights are saved.
  - PROMPT: Update with the prompt you want to use for inference.
  - IMAGE_FILE_PATH: Set this to the path of the input image.
Run the Inference:
```
sh launch_inference.sh
```

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
test_images		test_images
.gitignore		.gitignore
README.md		README.md
inference.py		inference.py
launch_inference.sh		launch_inference.sh
modeling_gemma.py		modeling_gemma.py
modeling_siglip.py		modeling_siglip.py
processing_paligemma.py		processing_paligemma.py
requirement.txt		requirement.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaLI-GEMMA Model Reproduction

Key Points

Setup

About

Releases

Packages

Languages

CazeroZ/PaliGemma_repro

Folders and files

Latest commit

History

Repository files navigation

PaLI-GEMMA Model Reproduction

Key Points

Setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages