duo_ai.models.ppo¶

Classes¶

`PPOModelOutput`	Output container for PPO model forward pass.
`ImpalaPPOModelConfig`	Configuration dataclass for ImpalaPPOModel.
`ImpalaPPOModel`	PPO model using an IMPALA encoder for feature extraction.
`ImpalaCoordPPOModelConfig`	Configuration dataclass for ImpalaCoordPPOModel.
`ImpalaCoordPPOModel`	PPO model for coordination environments, supporting multiple feature types.

Module Contents¶

class duo_ai.models.ppo.PPOModelOutput[source]¶

Output container for PPO model forward pass.

Parameters:

logits (torch.Tensor) – The raw action logits output by the policy head.
value (torch.Tensor) – The value function prediction output by the value head.
hidden (torch.Tensor) – The hidden feature representation from the model.

Examples

>>> output = PPOModelOutput(logits, value, hidden)

logits: torch.Tensor¶

value: torch.Tensor¶

hidden: torch.Tensor¶

class duo_ai.models.ppo.ImpalaPPOModelConfig[source]¶

Configuration dataclass for ImpalaPPOModel.

Parameters:: name (str, optional) – Name of the model class. Default is “ImpalaPPOModel”.

Examples

>>> config = ImpalaPPOModelConfig()

name: str = 'impala_ppo'¶

class duo_ai.models.ppo.ImpalaPPOModel(config: ImpalaPPOModelConfig, env: gym.Env)[source]¶

Bases: torch.nn.Module

PPO model using an IMPALA encoder for feature extraction.

Examples

>>> model = ImpalaPPOModel(ImpalaPPOModelConfig(), env)
>>> obs = torch.randn(8, 3, 64, 64)
>>> out = model(obs)
>>> print(out.logits.shape, out.value.shape)

config_cls¶

device¶

embedder¶

hidden_dim = 256¶

fc_policy¶

fc_value¶

logit_dim¶

forward(obs: Any) → PPOModelOutput[source]¶

Forward pass of the ImpalaPPOModel.

Parameters:: obs (torch.Tensor or np.ndarray) – Observation input to the model.
Returns:: Output container with logits, value, and hidden features.
Return type:: PPOModelOutput

Examples

>>> out = model(obs)
>>> print(out.logits.shape, out.value.shape)

class duo_ai.models.ppo.ImpalaCoordPPOModelConfig[source]¶

Configuration dataclass for ImpalaCoordPPOModel.

Parameters:

name (str, optional) – Name of the model class. Default is “ImpalaCoordPPOModel”.
feature_type (str, optional) – Type of feature representation to use. Options include: “obs”, “hidden”, “hidden_obs”, “dist”, “hidden_dist”, “obs_dist”, “obs_hidden_dist”. Default is “obs”.

Examples

>>> config = ImpalaCoordPPOModelConfig(feature_type="obs_hidden_dist")

name: str = 'impala_coord_ppo'¶

feature_type: str = 'obs'¶

class duo_ai.models.ppo.ImpalaCoordPPOModel(config: ImpalaCoordPPOModelConfig, env: gym.Env)[source]¶

Bases: torch.nn.Module

PPO model for coordination environments, supporting multiple feature types.

Examples

>>> model = ImpalaCoordPPOModel(ImpalaCoordPPOModelConfig(), env)
>>> obs = {"base_obs": ..., "novice_hidden": ..., "novice_logits": ...}
>>> out = model(obs)
>>> print(out.logits.shape, out.value.shape)

config_cls¶

device¶

embedder¶

feature_type¶

fc_policy¶

fc_value¶

logit_dim¶

forward(obs: Dict[str, Any]) → PPOModelOutput[source]¶

Forward pass of the ImpalaCoordPPOModel.

Parameters:: obs (dict) – Dictionary containing observation components (base_obs, novice_hidden, novice_logits).
Returns:: Output container with logits, value, and hidden features.
Return type:: PPOModelOutput

Examples

>>> out = model(obs)
>>> print(out.logits.shape, out.value.shape)