Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feature(yzj): add multi-agent and structured observation env (GoBigger) #39

Open
wants to merge 59 commits into
base: main
Choose a base branch
from

Conversation

jayyoung0802
Copy link
Collaborator

No description provided.

@puyuan1996 puyuan1996 self-assigned this Jun 1, 2023
@puyuan1996 puyuan1996 added the enhancement New feature or request label Jun 1, 2023
@@ -34,6 +36,7 @@ def __init__(
discrete_action_encoding_type: str = 'one_hot',
norm_type: Optional[str] = 'BN',
res_connection_in_dynamics: bool = False,
state_encoder=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加state_encoder的Type Hints以及相应的arguments注释

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 可以参考这里的提示词优化注释哈

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

beg_index = observation_shape * step_i
end_index = observation_shape * (step_i + self._cfg.model.frame_stack_num)
obs_target_batch_new[k] = v[:, beg_index:end_index]
network_output = self._learn_model.initial_inference(obs_target_batch_new)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面对结构化观察的处理或许可以抽象为一个函数

self.encoder = FCEncoder(obs_shape=18, hidden_size_list=[256, 256], activation=nn.ReLU(), norm_type=None)

def forward(self, x):
x = x['agent_state']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加注释,为什么是agent_state,x中包含哪些key,每一项的含义是什么

from pettingzoo.mpe._mpe_utils.simple_env import SimpleEnv, make_env
from pettingzoo.mpe.simple_spread.simple_spread import Scenario
from PIL import Image
import pygame
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimize import

tmp[k] = v[i]
tmp['action_mask'] = [1 for _ in range(*self._action_dim)]
ret_transform.append(tmp)
return {'observation': ret_transform, 'action_mask': action_mask, 'to_play': to_play}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

关于'observation'的详细注释加在_process_obs()方法的overview中

last_game_priorities = [[None for _ in range(agent_num)] for _ in range(env_nums)]
# for priorities in self-play
search_values_lst = [[[] for _ in range(agent_num)] for _ in range(env_nums)]
pred_values_lst = [[[] for _ in range(agent_num)] for _ in range(env_nums)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样出现多次的代码段,或许可以抽象为class的一个工具函数

@@ -0,0 +1 @@
from .ptz_simple_spread_ez_config import main_config, create_config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所有lz中的petting_zoo换成pettingzoo或许更加简洁

@@ -44,6 +46,8 @@ def __init__(self, cfg: dict):
self.base_idx = 0
self.clear_time = 0

self.tmp_obs = None # for value obs list [46 + 4(td_step)] not < 50(game_segment)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

优化注释,注释尽量完整清晰

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

m_obs = value_obs_list[beg_index:end_index]
m_obs = sum(m_obs, [])
m_obs = default_collate(m_obs)
m_obs = to_device(m_obs, self._cfg.device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

抽象为一个数据处理函数,放在utils中?

@@ -34,6 +36,7 @@ def __init__(
discrete_action_encoding_type: str = 'one_hot',
norm_type: Optional[str] = 'BN',
res_connection_in_dynamics: bool = False,
state_encoder=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 可以参考这里的提示词优化注释哈

"""
Overview:
The policy class for Multi Agent EfficientZero.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

说明目前的Multi Agent算法与单agent算法的区别,概述一下目前的indepent learning的实现方式。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)
# NOTE: Convert the ``action_index_in_legal_action_set`` to the corresponding ``action`` in the entire action set.
action = np.where(action_mask[i] == 1.0)[0][action_index_in_legal_action_set]
output[i // agent_num]['action'].append(action)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加注释

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""
Overview:
The policy class for Multi Agent MuZero.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

from ding.utils import ENV_REGISTRY, deep_merge_dicts
import math
from easydict import EasyDict
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加一下GoBigger原来仓库的链接,以及这里与其的区别吧?

Copy link
Collaborator Author

@jayyoung0802 jayyoung0802 Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try except中加了链接


main_config = dict(
exp_name=
f'data_mz_ctree/{env_name}_muzero_ns{num_simulations}_upc{update_per_collect}_rr{reanalyze_ratio}_seed{seed}',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前这里的ptz_simple_spread_mz性能是如何的呀?如果不太好,先把ptz相关的去掉吧

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

max_env_step: Optional[int] = int(1e10),
) -> 'Policy': # noqa
"""
Overview:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前为什么需要为ptz单独写entry呢?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为需要单独传encoder

@@ -47,12 +47,12 @@ def train_muzero(
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

合并一下main分支,将mz ez的相关基线结果加在PR的description里面。然后优化好后新建一个分支 multi-agent, push到opendilab/lightzero 上去,在这个PR后面写一下,最新的稳定代码放在了 multi-agent 这个分支上面。

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request environment New or improved environment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants