StarDoc model training #5

akshaykalkunte · 2024-10-17T01:31:46Z

WIP StarDoc model integration into FastLLM

tscholak · 2024-11-11T15:01:11Z

Hi @jlamypoirier! @akshaykalkunte and I talked and we want to push this PR over the finish line. There's a lot going on here, and we should review the approach top down to decide how this needs to be refactored to go into main. At the top of my head are the following separate concerns:

Model architecture: Are VLMs GPTs from the point of view of Fast-LLM? I think they aren't because too much is different. We should add a new model architecture (e.g. "vlm") to Fast-LLM.
Data preprocessing: Related to Add prepare command #38, we should factor out data preprocessing and introduce an offline preprocessing step, fast-llm prepare_data vlm --config stardoc.yaml, that makes VLMMemmapDatasets and stores them on disk.
Vision encoder implementation: Right now it's a monolithic wrapper layer that uses a HF auto model. We should discuss if and when we reimplement this in Fast-LLM. This can be a separate effort and (as a side effect) result in yet another model class, vision_encoder, that we can also train from scratch if we wanted to.
Cross-attention instead of adapter layer: StarDoc is moving towards a special form of cross-attention between the vision encoder and the LM decoder. This likely has implications for parallelization.
Llama 3 support: StarDoc will use pre-trained Llama 3.2 (text-only?) models, we need to be able to load them. See also [feat] Llama 3.x rope scaling support #39.
YAML configs: This PR currently doesn't support Fast-LLM's new YAML-based configs.

I think we can divide and conquer here.

akshaykalkunte and others added 5 commits October 9, 2024 23:22

stardoc_init

f9dc5d6

Code cleanup

162438f

Merge remote-tracking branch 'origin/main' into akshay/stardoc

2960957

cleanup 2

b2d0c6e

Merge branch 'main' into akshay/stardoc

14f82ed

jlamypoirier mentioned this pull request Oct 25, 2024

[feat] Integrate dataset re-weighting and preprocessing into Fast-LLM for streamlined data loading #25

Open

tscholak mentioned this pull request Nov 11, 2024

Add prepare command #38

Merged

24 tasks

akshaykalkunte added 3 commits November 14, 2024 00:24

Re-factor to new changes in public repo

00be01f

Merge remote-tracking branch 'origin' into akshay/stardoc

18fded2

Merge remote-tracking branch 'origin/main' into akshay/stardoc

e271ac6

jlamypoirier mentioned this pull request Dec 17, 2024

Fix llama conversion, improve parameter conversion #94

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StarDoc model training #5

StarDoc model training #5

akshaykalkunte commented Oct 17, 2024

tscholak commented Nov 11, 2024 •

edited

Loading

StarDoc model training #5

Are you sure you want to change the base?

StarDoc model training #5

Conversation

akshaykalkunte commented Oct 17, 2024

tscholak commented Nov 11, 2024 • edited Loading

tscholak commented Nov 11, 2024 •

edited

Loading