Skip to content

volcengine/verl

Repository files navigation

veRL: Volcano Engine Reinforcement Learning for LLM

veRL is a flexible, efficient and production-ready RL training framework designed for large language models (LLMs).

veRL is the open-source version of HybridFlow: A Flexible and Efficient RLHF Framework paper.

veRL is flexible and easy to use with:

  • Easy extension of diverse RL algorithms: The Hybrid programming model combines the strengths of single-controller and multi-controller paradigms to enable flexible representation and efficient execution of complex Post-Training dataflows. Allowing users to build RL dataflows in a few lines of code.

  • Seamless integration of existing LLM infra with modular APIs: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as PyTorch FSDP, Megatron-LM and vLLM. Moreover, users can easily extend to other LLM training and inference frameworks.

  • Flexible device mapping: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.

  • Readily integration with popular HuggingFace models

veRL is fast with:

  • State-of-the-art throughput: By seamlessly integrating existing SOTA LLM training and inference frameworks, veRL achieves high generation and training throughput.

  • Efficient actor model resharding with 3D-HybridEngine: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.

| Documentation | Paper | Slack | Wechat |

News

Key Features

  • FSDP and Megatron-LM for training.
  • vLLM and TGI for rollout generation, SGLang support coming soon.
  • huggingface models support
  • Supervised fine-tuning
  • Reward model training
  • Reinforcement learning from human feedback with PPO
  • flash-attention integration, sequence packing
  • scales up to 70B models and hundreds of GPUs
  • experiment tracking with wandb and mlflow

Getting Started

Checkout this Jupyter Notebook to get started with PPO training with a single 24GB L4 GPU (FREE GPU quota provided by Lighting Studio)!

Quickstart:

Running an PPO example step-by-step:

Reproducible algorithm baselines:

For code explanation and advance usage (extension):

Citation

If you find the project helpful, please cite:

@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}

Publications Using veRL