GitHub - nick7nlp/FastCuRL: FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models

FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models

2025.05.23

We release FastCuRL-1.5B-V3 and FastCuRL-1.5B-V2.

2025.03.17

We release FastCuRL-1.5B-Preview, a slow-thinking reasoning model that outperforms 📈 the previous SoTA DeepScaleR-1.5B-Preview with 🚀 50% training steps! We propose a curriculum RL framework with stage-wise context scaling to achieve efficient training and concise CoT reasoning based on DeepSeek-R1-Distil-Qwen-1.5B and observe continuous performance improvement as training steps increase. To better reproduce our work and advance research progress, we open-source our code, model, and data.

Key Results

Model	Training Steps	Training Stages	Number of GPUs Used in Each Stage
DeepScaleR-1.5B-Preview	~1,750	3	8, 16, 32
FastCuRL-1.5B-Preview	~860	4	8, 8, 8, 8
FastCuRL-1.5B-V2	~1,710	5	8, 8, 8, 8, 8
FastCuRL-1.5B-V3	~2,620	5	8, 8, 8, 8, 8

Here, we uniformly set the batch size to 128 for counting training steps, meaning two steps with a batch size of 64 are counted as one with a batch size of 128.

We report Pass@1 accuracy averaged over 16 samples for each problem.

Model	AIME 2024	MATH 500	AMC 2023	Minerva Math	OlympiadBench	Avg.
Qwen2.5-Math-7B-Instruct	13.3	79.8	50.6	34.6	40.7	43.8
rStar-Math-7B	26.7	78.4	47.5	-	47.1	-
Eurus-2-7B-PRIME	26.7	79.2	57.8	38.6	42.1	48.9
Qwen2.5-7B-SimpleRL	26.7	82.4	62.5	39.7	43.3	50.9
DeepSeek-R1-Distill-Qwen-1.5B	28.8	82.8	62.9	26.5	43.3	48.9
Still-1.5B	32.5	84.4	66.7	29.0	45.4	51.6
DeepScaleR-1.5B-Preview	43.1	87.8	73.6	30.2	50.0	57.0
FastCuRL-1.5B-Preview	43.1	88.0	74.2	31.6	50.4	57.5
FastCuRL-1.5B-V2	47.5	89.3	77.0	32.8	53.3	60.0
FastCuRL-1.5B-V3	49.6	90.5	78.5	34.7	54.5	61.6

Getting Started 🎯

Installation

# Installing Python 3.10 Environment.
conda create -n rllm python=3.10 -y
conda activate rllm

# Installing RLLM dependencies.
cd rllm
pip install -e ./verl
pip install -e .

Training Data

Following DeepScaleR, our training dataset consists of 40,315 unique problem-answer pairs compiled from:

AIME problems (1984-2023)
AMC problems (before 2023)
Omni-MATH dataset
Still dataset

Entropy

Training Scripts

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export VLLM_ATTENTION_BACKEND=XFORMERS

# Run 8K context length training, 160 steps
bash ./scripts/train/run_fastcurl_1.5b_8k_stage1.sh | tee -a fastcurl-1.5b-stage1.log

# Run 16K context length training, 590 steps
bash ./scripts/train/run_fastcurl_1.5b_16k_stage2.sh | tee -a fastcurl-1.5b-stage2.log

# Run 24K context length training, 230 steps
bash ./scripts/train/run_fastcurl_1.5b_24k_stage3.sh | tee -a fastcurl-1.5b-stage3.log

# Run 16K context length training, 580 steps
bash ./scripts/train/run_fastcurl_1.5b_16k_stage4.sh | tee -a fastcurl-1.5b-stage4.log

Evaluate

python3 -m verl.trainer.main_generation \
    trainer.nnodes=1 \
    trainer.n_gpus_per_node=8 \
    data.path=./fastcurl/data/test/xxx.parquet \
    data.output_path=${OUTPUT_DIR}/xxx.parquet \
    data.n_samples=16 \
    data.batch_size=2048 I am running a few minutes late; my previous meeting is running over.
    
    model.path=${MODEL_PATH} \
    rollout.temperature=0.6 \
    rollout.response_length=32768 \
    rollout.top_k=-1 \
    rollout.top_p=1 \
    rollout.gpu_memory_utilization=0.9 \
    rollout.tensor_model_parallel_size=1

Citation

@misc{fastcurl,
      title={FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models}, 
      author={Mingyang Song and Mao Zheng and Zheng Li and Wenjie Yang and Xuan Luo and Yue Pan and Feng Zhang},
      year={2025},
      eprint={2503.17287},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.17287}, 
}

Acknowledgements

Our model is trained on top of DeepSeek-R1-Distill-Qwen-1.5B.
Our training experiments are powered by our heavily modified fork of verl.
We directly use DeepScaleR's code to finish our experiments. However, we have modified parts of the code related to naming conflicts to avoid confusion.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
fastcurl		fastcurl
img		img
pdf		pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models

2025.05.23

2025.03.17

Key Results

Getting Started 🎯

Installation

Training Data

Entropy

Training Scripts

Evaluate

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

nick7nlp/FastCuRL

Folders and files

Latest commit

History

Repository files navigation

FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models

2025.05.23

2025.03.17

Key Results

Getting Started 🎯

Installation

Training Data

Entropy

Training Scripts

Evaluate

Citation

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages