Skip to content

This repository contains a reproduction of the Google's PaLiGemma model, focusing on its inference capabilities.

Notifications You must be signed in to change notification settings

CazeroZ/PaliGemma_repro

Repository files navigation

PaLI-GEMMA Model Reproduction

A reproduction of Google's PaLI-GEMMA (an open vision-language model), mainly for self-learning purposes. This project references Google's official code and the associated research paper. It includes detailed notes on tensor shape transformations within the network for deeper understanding.

Key Points

  • Focus: Documenting tensor shape transformations at each layer for in-depth learning of the model structure.
  • Goal: To provide a clear, readable version of PaLI-GEMMA for self-study.

Setup

  1. Download Model Weights: Download the model weights from PaLI-GEMMA 3B on Hugging Face

  2. Clone the repository and install dependencies:

    git clone git@github.com:CazeroZ/PaliGemma_repro.git
    cd PaliGemma_repro
    pip install -r requirements.txt
  3. Configure and Run the Inference Script:

    • Open launch_inference.sh and modify the following variables as needed:
      • MODEL_PATH: Set this to the directory where the downloaded model weights are saved.
      • PROMPT: Update with the prompt you want to use for inference.
      • IMAGE_FILE_PATH: Set this to the path of the input image.
  4. Run the Inference:

    sh launch_inference.sh

About

This repository contains a reproduction of the Google's PaLiGemma model, focusing on its inference capabilities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published