duo_ai.policies.always¶

Classes¶

`AlwaysPolicyConfig`	Configuration dataclass for AlwaysPolicy.
`AlwaysPolicy`	Policy that always selects the same agent (novice or expert) for every action.

Module Contents¶

class duo_ai.policies.always.AlwaysPolicyConfig[source]¶

Configuration dataclass for AlwaysPolicy.

Parameters:

name (str, optional) – Name of the policy class. Default is “always”.
agent (str, optional) – The agent type to always select. Options are “novice” or “expert”. Default is “novice”.
load_path (str, optional) – Path to a checkpoint to load. Default is None.

Examples

>>> config = AlwaysPolicyConfig(agent="expert")

name: str = 'always'¶

agent: str = 'novice'¶

load_path: str | None = None¶

class duo_ai.policies.always.AlwaysPolicy(config: AlwaysPolicyConfig, env: gym.Env)[source]¶

Bases: duo_ai.core.policy.Policy

Policy that always selects the same agent (novice or expert) for every action.

Examples

>>> policy = AlwaysPolicy(AlwaysPolicyConfig(agent="novice"), env)
>>> obs = ...
>>> action = policy.act(obs)

config_cls¶

choice¶

device¶

config¶

act(obs: Any, temperature: float | None = None) → torch.Tensor[source]¶

Select the constant action for a batch of observations.

Parameters:

obs (dict or np.ndarray) – Batch of observations. If dict, must contain ‘base_obs’.
temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of constant actions (agent indices) for the batch.

Return type:

torch.Tensor

Raises:

ValueError – If obs is not a dict or numpy array.

Examples

>>> action = policy.act(obs)

reset(done: numpy.ndarray) → None[source]¶

Reset the policy state at episode boundaries.

Parameters:: done (np.ndarray) – Boolean array indicating which episodes in a batch require a reset.
Return type:: None

Examples

>>> policy.reset(done)

get_params() → Dict[str, Any][source]¶

Get the current parameters of the policy.

Returns:: Dictionary of policy parameters.
Return type:: dict

Examples

>>> params = policy.get_params()

set_params(params: Dict[str, Any]) → None[source]¶

Set the parameters of the policy.

Parameters:: params (dict) – Dictionary of policy parameters to set.
Return type:: None

Examples

>>> policy.set_params(params)

train() → None[source]¶

Set the policy to training mode.

Return type:: None

Examples

>>> policy.train()

eval() → None[source]¶

Set the policy to evaluation mode.

Return type:: None

Examples

>>> policy.eval()