Rewrite `mergekit-extract-lora` #505

cg123 · 2025-02-07T00:41:57Z

Now with better embedding handling, multi-gpu execution, and lazy loading/saving of tensors.

When extracting a LoRA from an 8B model, execution time goes from ~6 minutes down to 40 seconds with --cuda --multi-gpu on an 8-GPU machine.

Additionally, the --sv-epsilon flag can be used to set a tolerance for singular values to opportunistically reduce rank when the fine tuned difference is inherently lower rank.

Also reimplement a couple of merge methods using the @easy_define decorator and add some missing tests.

…hold, correctness

cg123 added 13 commits February 6, 2025 12:57

Remove nearswap & SCE

fa5984a

Axe consensus ta/ties

b2a2b8b

Add l1/l2/linf normalization

1cd5848

Reimplement SCE & nearswap

2108787

Tests

9ef7d95

Remove DELLA

3106c42

Reimplement DELLA

d5aebdb

Formatting

959a566

Remove extract-lora

d91b642

First pass at graph-based mergekit-extract-lora

5783dfc

Add warning for vocab size, better embed handling, use CUDA, sv thres…

b8e7bbe

…hold, correctness

Fix argument parsing

e81fbfa

Unbork TOC

683e54d

cg123 merged commit a2dda31 into main Feb 7, 2025
8 checks passed

cg123 deleted the rewrites branch February 7, 2025 00:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite `mergekit-extract-lora` #505

Rewrite `mergekit-extract-lora` #505

cg123 commented Feb 7, 2025

Rewrite mergekit-extract-lora #505

Rewrite mergekit-extract-lora #505

Conversation

cg123 commented Feb 7, 2025

Rewrite `mergekit-extract-lora` #505

Rewrite `mergekit-extract-lora` #505