-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feature(yzj): add multi-agent and structured observation env (GoBigger) #39
base: main
Are you sure you want to change the base?
Conversation
@@ -34,6 +36,7 @@ def __init__( | |||
discrete_action_encoding_type: str = 'one_hot', | |||
norm_type: Optional[str] = 'BN', | |||
res_connection_in_dynamics: bool = False, | |||
state_encoder=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
增加state_encoder的Type Hints以及相应的arguments注释
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 可以参考这里的提示词优化注释哈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
beg_index = observation_shape * step_i | ||
end_index = observation_shape * (step_i + self._cfg.model.frame_stack_num) | ||
obs_target_batch_new[k] = v[:, beg_index:end_index] | ||
network_output = self._learn_model.initial_inference(obs_target_batch_new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面对结构化观察的处理或许可以抽象为一个函数
zoo/petting_zoo/model/model.py
Outdated
self.encoder = FCEncoder(obs_shape=18, hidden_size_list=[256, 256], activation=nn.ReLU(), norm_type=None) | ||
|
||
def forward(self, x): | ||
x = x['agent_state'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
增加注释,为什么是agent_state,x中包含哪些key,每一项的含义是什么
from pettingzoo.mpe._mpe_utils.simple_env import SimpleEnv, make_env | ||
from pettingzoo.mpe.simple_spread.simple_spread import Scenario | ||
from PIL import Image | ||
import pygame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimize import
tmp[k] = v[i] | ||
tmp['action_mask'] = [1 for _ in range(*self._action_dim)] | ||
ret_transform.append(tmp) | ||
return {'observation': ret_transform, 'action_mask': action_mask, 'to_play': to_play} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
关于'observation'的详细注释加在_process_obs()方法的overview中
last_game_priorities = [[None for _ in range(agent_num)] for _ in range(env_nums)] | ||
# for priorities in self-play | ||
search_values_lst = [[[] for _ in range(agent_num)] for _ in range(env_nums)] | ||
pred_values_lst = [[[] for _ in range(agent_num)] for _ in range(env_nums)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这样出现多次的代码段,或许可以抽象为class的一个工具函数
zoo/petting_zoo/config/__init__.py
Outdated
@@ -0,0 +1 @@ | |||
from .ptz_simple_spread_ez_config import main_config, create_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
所有lz中的petting_zoo换成pettingzoo或许更加简洁
@@ -44,6 +46,8 @@ def __init__(self, cfg: dict): | |||
self.base_idx = 0 | |||
self.clear_time = 0 | |||
|
|||
self.tmp_obs = None # for value obs list [46 + 4(td_step)] not < 50(game_segment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
优化注释,注释尽量完整清晰
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
m_obs = value_obs_list[beg_index:end_index] | ||
m_obs = sum(m_obs, []) | ||
m_obs = default_collate(m_obs) | ||
m_obs = to_device(m_obs, self._cfg.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
抽象为一个数据处理函数,放在utils中?
@@ -34,6 +36,7 @@ def __init__( | |||
discrete_action_encoding_type: str = 'one_hot', | |||
norm_type: Optional[str] = 'BN', | |||
res_connection_in_dynamics: bool = False, | |||
state_encoder=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 可以参考这里的提示词优化注释哈
""" | ||
Overview: | ||
The policy class for Multi Agent EfficientZero. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
说明目前的Multi Agent算法与单agent算法的区别,概述一下目前的indepent learning的实现方式。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
) | ||
# NOTE: Convert the ``action_index_in_legal_action_set`` to the corresponding ``action`` in the entire action set. | ||
action = np.where(action_mask[i] == 1.0)[0][action_index_in_legal_action_set] | ||
output[i // agent_num]['action'].append(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
增加注释
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
""" | ||
Overview: | ||
The policy class for Multi Agent MuZero. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
from ding.utils import ENV_REGISTRY, deep_merge_dicts | ||
import math | ||
from easydict import EasyDict | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加一下GoBigger原来仓库的链接,以及这里与其的区别吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try except中加了链接
|
||
main_config = dict( | ||
exp_name= | ||
f'data_mz_ctree/{env_name}_muzero_ns{num_simulations}_upc{update_per_collect}_rr{reanalyze_ratio}_seed{seed}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前这里的ptz_simple_spread_mz性能是如何的呀?如果不太好,先把ptz相关的去掉吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
max_env_step: Optional[int] = int(1e10), | ||
) -> 'Policy': # noqa | ||
""" | ||
Overview: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之前为什么需要为ptz单独写entry呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为需要单独传encoder
lzero/entry/train_muzero.py
Outdated
@@ -47,12 +47,12 @@ def train_muzero( | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
合并一下main分支,将mz ez的相关基线结果加在PR的description里面。然后优化好后新建一个分支 multi-agent, push到opendilab/lightzero 上去,在这个PR后面写一下,最新的稳定代码放在了 multi-agent 这个分支上面。
No description provided.