Skip to content

Agent Traces Pipeline #565

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Draft
wants to merge 37 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
e7df036
Start agent traces
aymeric-roucher Feb 24, 2025
6d0963e
Working local version with o1
aymeric-roucher Feb 25, 2025
a6f5a15
Update api addr
aymeric-roucher Feb 26, 2025
38bfa93
Increase concurrent requests
aymeric-roucher Feb 26, 2025
7d9fc6e
Update sbatch params
aymeric-roucher Feb 26, 2025
7a1fb98
Add conda activation
aymeric-roucher Feb 26, 2025
1a7becf
Use local model
aymeric-roucher Feb 26, 2025
f35337e
128 concurrent
aymeric-roucher Feb 26, 2025
28bc464
Log
aymeric-roucher Feb 26, 2025
319ae52
Add conda init
aymeric-roucher Feb 26, 2025
69d55f6
Fix slurm script
aymeric-roucher Feb 26, 2025
c8aa2c4
Add await
aymeric-roucher Feb 26, 2025
6df6161
Try fixing async func
aymeric-roucher Feb 26, 2025
b402450
Add stop sequences
aymeric-roucher Feb 26, 2025
b2996c1
Add port
aymeric-roucher Feb 27, 2025
f6f138b
Make synchronous
aymeric-roucher Feb 28, 2025
23c2128
Small adapts to script
aymeric-roucher Feb 28, 2025
52ac4e2
More detailed error logging
aymeric-roucher Feb 28, 2025
0adc082
Even more detailed request error logging
aymeric-roucher Feb 28, 2025
884c8e9
Reduce context length
aymeric-roucher Feb 28, 2025
64ae551
Add token counting
aymeric-roucher Feb 28, 2025
2e7d1da
Fix message roles an add token counting
aymeric-roucher Feb 28, 2025
7bcb96e
Add dummy completion
aymeric-roucher Feb 28, 2025
28afbef
Test
aymeric-roucher Feb 28, 2025
5ed2005
Running with gpt-4o
aymeric-roucher Feb 28, 2025
ce7d8bd
Update timeouts
aymeric-roucher Feb 28, 2025
6a9db1b
Adjust
aymeric-roucher Feb 28, 2025
e245aa0
Flatten messages
aymeric-roucher Feb 28, 2025
b6de9cb
Prompt more around testing the function
aymeric-roucher Feb 28, 2025
9cdf0d9
Improve explanations in prompt
aymeric-roucher Feb 28, 2025
ef3f888
Also store final outputs
aymeric-roucher Mar 13, 2025
91e4dc1
wip(generate + eda): working generation + add initial eda
baptistecolle Mar 31, 2025
5d7205d
feat(eda): uploaded dataset for training
baptistecolle Mar 31, 2025
c1cea15
feat(train): added training recipe for agentic traces
baptistecolle Mar 31, 2025
8cc3983
fix(deps): fix smolagent dep
baptistecolle Mar 31, 2025
3b021de
fix(deps): fix smolagent dep
baptistecolle Mar 31, 2025
2fbac03
fix: remove uncessary changes
baptistecolle Mar 31, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions agentic-traces/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Generate agent traces

## Step 1: Install (setup the environment)

```bash
make install
```

```bash
source openr1/bin/activate
uv pip install -e ".[smolagents,jupyter]"
```

## Step 2: Start the R1 server

for the `serve_r1.slurm` file do not forget to add the router address

```bash
sbatch slurm/serve_router.slurm
sbatch slurm/serve_r1.slurm
```

## Step 3: Generate traces

This takes ~3 days to complete.

```bash
sbatch slurm/agentic_generation.slurm
```

## Step 4: Process the traces and upload dataset to the hub

This is done in a jupyter notebook for ease of use during development.

Follow the instructions in eda.ipynb to process the traces into a training dataset.
The notebook filters the failed generation traces then it upload the dataset to the hub for later use.

**TODO:**
- filter the traces to keep traces that pass the test cases
- filter by length of the generation, so traces that converge quickly are favoured.

**Remarks:**
Right now, the `generate_agent_traces.py` file seems to be buggy, it does not generate a single correct trace.By correct, I mean a trace that passes the test cases.

The dataset can be found at https://huggingface.co/datasets/baptistecolle/codeforces-agentic-generations

## Step 5: Train on the traces and upload the model to the hub

```bash
sbatch --nodes=1 --time=8:00:00 slurm/train.slurm Qwen2.5-1.5B-Instruct sft demo_agentic_trace zero3 '--per_device_train_batch_size=1 --num_train_epochs=5'
```

The trainedmodel can be found at https://huggingface.co/baptistecolle/Qwen2.5-1.5B-Open-R1-Distill-Agentic-Trace

## Step 6: Test the model
first need to fix the generate_agent_traces.py file before testing the model I believe (see: `generate_agent_traces.py` file is not working)
**TODO:** create some custom metrics in lighteval for the agentic traces.

# TODOs:
- **The `generate_agent_traces.py` file is not working**: most of the generation of the traces fails, and furthermore based on the eda (exploratory data analysis) none of the generated traces acutally pass the test cases, indeed almost all traces end with `Error:\\nReached max steps.` so none of the generated traces actually solve the test cases

# Current status
- The pipeline is present, now we just need to debug it to increase performance.
Loading