feature(whl): add rlhf pipeline. #748

kxzxvbk · 2023-11-06T02:14:27Z

Description

Related Issue

TODO

Check List

merge the latest version source branch/repo, and resolve all the conflicts
pass style check
pass all the tests

PaParaZz1 · 2023-11-15T08:24:58Z

ding/bonus/ppof.py

@@ -18,6 +19,7 @@
 from .model import PPOFModel
 from .config import get_instance_config, get_instance_env, get_hybrid_shape
 from ding.bonus.common import TrainingReturn, EvalReturn
+from ..framework.middleware.collector import ChatCollector


merge it into ding.framework

PaParaZz1 · 2023-11-15T08:26:04Z

ding/framework/middleware/collector.py

+    """
+    Overview:
+        The class of the collector running by steps, including model inference and transition \
+            process. Use the `__call__` method to execute the whole collection process.


why indent here

PaParaZz1 · 2023-11-15T08:26:34Z

ding/model/common/utils.py

+
+def top_p_logits(logits, topp=0.9, filter_value=0, min_topk=1):
+    """
+    Filter a distribution of logits using nucleus (top-p) filtering


polish comments add add unittest

PaParaZz1 · 2023-11-15T08:27:37Z

ding/model/common/utils.py

+    if topp > 0:
+        logits_sorted, inds = torch.sort(logits, dim=-1, descending=True)
+        mask = (logits_sorted.cumsum(dim=-1) - logits_sorted) >= topp
+        mask[:, :min_topk] = False


..., :min_topk

PaParaZz1 · 2023-11-15T08:31:47Z

ding/model/template/vac.py

@@ -1,4 +1,7 @@
 from typing import Union, Dict, Optional
+


move these modifications to a new single file: lm_vac.py

PaParaZz1 · 2023-11-15T08:35:05Z

ding/reward_model/language_reward_model.py

+
+    def __init__(self, config, opt, tokenizer):
+        super().__init__(config)
+        self.opt = opt


why define opt here

PaParaZz1 · 2023-11-15T08:35:21Z

ding/reward_model/language_reward_model.py

+        else:
+            logits = self.reward_head(output.last_hidden_state).squeeze(-1)
+
+        return (logits, )


why return a tuple here

PaParaZz1 · 2023-11-15T08:35:57Z

dizoo/chat/env.py

+        self._init_flag = False
+
+    def reset(self):
+        self.last_batch = next(self.generator)


Do you need to restrat generatore here?

PaParaZz1 · 2023-11-15T08:37:05Z

ding/reward_model/language_reward_model.py

+
+class LlamaRewardModel(LlamaForCausalLM):
+
+    def __init__(self, config, opt, tokenizer):


Should we move the creation of tokenizer insides the constructor of RM?

PaParaZz1 · 2023-11-15T08:37:50Z

launch_ppof.py

@@ -0,0 +1,50 @@
+from easydict import EasyDict


move it to dizoo/chat/entry

codecov · 2024-01-03T08:16:46Z

Codecov Report

Attention: 252 lines in your changes are missing coverage. Please review.

Comparison is base (d7a61c2) 76.78% compared to head (f3a8245) 76.83%.

Files	Patch %	Lines
ding/model/template/lm_vac.py	20.00%	92 Missing ⚠️
ding/policy/ppof.py	5.74%	82 Missing ⚠️
ding/framework/middleware/collector.py	15.62%	27 Missing ⚠️
ding/rl_utils/gae.py	11.11%	16 Missing ⚠️
ding/reward_model/language_reward_model.py	31.57%	13 Missing ⚠️
ding/bonus/ppof.py	0.00%	12 Missing ⚠️
ding/bonus/config.py	0.00%	10 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #748      +/-   ##
==========================================
+ Coverage   76.78%   76.83%   +0.04%     
==========================================
  Files         671      674       +3     
  Lines       53196    53935     +739     
==========================================
+ Hits        40847    41440     +593     
- Misses      12349    12495     +146

Flag	Coverage Δ
unittests	`76.83% <20.50%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

‘whl’ added 12 commits October 24, 2023 09:24

init commit

7ba2125

debug

eb5d8c8

debug

f45b742

debug

1ca316f

debug

db72c2c

debug

3f1e47b

debug

dc4cece

debug

30f2994

debug

c1cc454

debug

24de047

debug

c9b71ee

reformat

4418c68

PaParaZz1 added enhancement New feature or request algo Add new algorithm or improve old one labels Nov 6, 2023

PaParaZz1 requested changes Nov 15, 2023

View reviewed changes

PaParaZz1 mentioned this pull request Nov 15, 2023

Roadmap for DI-engine #548

Open

‘whl’ added 6 commits November 24, 2023 15:46

polish

d5f931b

add mix precision

11d3c48

fix quant bug

f95529f

fix imcopatible problem

31a6191

reformat

538ddd2

Merge branch 'main' into rlhf

f3a8245

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(whl): add rlhf pipeline. #748

feature(whl): add rlhf pipeline. #748

kxzxvbk commented Nov 6, 2023

PaParaZz1 Nov 15, 2023

PaParaZz1 Nov 15, 2023

PaParaZz1 Nov 15, 2023

PaParaZz1 Nov 15, 2023

PaParaZz1 Nov 15, 2023

PaParaZz1 Nov 15, 2023

PaParaZz1 Nov 15, 2023

PaParaZz1 Nov 15, 2023

PaParaZz1 Nov 15, 2023

PaParaZz1 Nov 15, 2023

codecov bot commented Jan 3, 2024


		class LlamaRewardModel(LlamaForCausalLM):

		def __init__(self, config, opt, tokenizer):

feature(whl): add rlhf pipeline. #748

Are you sure you want to change the base?

feature(whl): add rlhf pipeline. #748

Conversation

kxzxvbk commented Nov 6, 2023

Description

Related Issue

TODO

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 3, 2024

Codecov Report