Skip to content

Commit

Permalink
mergekit-multi: execute DAG of merge configurations (arcee-ai#506)
Browse files Browse the repository at this point in the history
Given a set of recipes, schedule and execute them using the
`Task`/`Executor` infrastructure.
  • Loading branch information
cg123 authored Feb 8, 2025
1 parent ddd2352 commit 30b67a2
Show file tree
Hide file tree
Showing 6 changed files with 362 additions and 4 deletions.
87 changes: 87 additions & 0 deletions docs/multimerge.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# mergekit-multi: Multi-Stage Model Merging

## What is mergekit-multi?

`mergekit-multi` is a command-line tool for executing complex model merging workflows with multiple interdependent stages. It allows you to:

1. Chain multiple merge operations together
2. Use outputs from previous merges as inputs to subsequent ones
3. Automatically handle dependencies between merge steps
4. Cache intermediate results for faster re-runs

## Usage

Basic command structure:
```bash
mergekit-multi <config.yaml> \
--intermediate-dir ./intermediates \
([--out-path ./final-merge] | if config has unnamed merge) \
[options]
```
## Configuration File Format
Create a YAML file with multiple merge configurations separated by `---`. Each should contain:
- `name`: Unique identifier for intermediate merges (except final merge)
- Standard mergekit configuration parameters
Example with Final Merge (`multimerge.yaml`):
```yaml
name: first-merge
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
- model: BioMistral/BioMistral-7B
parameters:
weight: 0.5
---
name: second-merge
merge_method: slerp
base_model: first-merge # Reference previous merge
models:
- model: NousResearch/Hermes-2-Pro-Mistral-7B
parameters:
t: 0.5
---
# Final merge (no name)
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: second-merge
parameters:
density: 0.6
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.8
weight: 0.5
```
### Example with All Named Merges:
```yaml
name: first-merge
merge_method: task_arithmetic
...
---
name: second-merge
merge_method: slerp
...
---
name: third-merge
merge_method: linear
...
```
## Key Options
- `--intermediate-dir`: Directory to store partial merge results
- `--out-path`: Output path for final merge (only applies when one merge has no `name`)
- `--lazy/--no-lazy`: Don't rerun existing intermediate merges (default: true)
- Standard mergekit options apply (e.g., `--cuda`, `--out-shard-size`, `--multi-gpu`)
## How It Works
When you run `mergekit-multi`, it topologically sorts your merge configurations to determine the correct order of execution. The merges are then processed sequentially, using outputs from previous steps as inputs for subsequent ones as needed.
All intermediate merges are saved in your specified `--intermediate-dir` using their configured names. By default, the tool will skip any merge operations that already have existing output files. To force re-execution of all merges, use the `--no-lazy` flag.
6 changes: 6 additions & 0 deletions mergekit/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,12 @@ def validate_string(cls, value):

@model_serializer()
def serialize(self):
if self.override_architecture is not None:
return {
"model": self.model,
"lora": self.lora,
"override_architecture": self.override_architecture,
}
res = str(self)
if '"' in res or " " in res:
return self
Expand Down
7 changes: 4 additions & 3 deletions mergekit/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ def __init__(
def run(
self,
quiet: bool = False,
desc: Optional[str] = None,
) -> Iterator[Tuple[Task, Any]]:
"""
Execute the computed schedule and yield the target values.
Expand All @@ -177,7 +178,7 @@ def run(
pbar := tqdm.tqdm(
list(enumerate(self.schedule)),
disable=quiet,
desc="Executing graph",
desc=desc or "Executing graph",
)
):
use_math_device = task.uses_accelerator()
Expand Down Expand Up @@ -215,11 +216,11 @@ def run(
del values
del pbar

def execute(self) -> None:
def execute(self, desc: Optional[str] = None) -> None:
"""
Execute all tasks and discard results.
"""
for task, value in self.run():
for task, value in self.run(desc=desc):
pass

def _move_tensors(
Expand Down
2 changes: 1 addition & 1 deletion mergekit/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from mergekit.common import parse_kmb


class MergeOptions(BaseModel):
class MergeOptions(BaseModel, frozen=True):
allow_crimes: bool = False
transformers_cache: Optional[str] = None
lora_merge_cache: Optional[str] = None
Expand Down
Loading

0 comments on commit 30b67a2

Please # to comment.