duo_ai.policies.random¶

Classes¶

`RandomPolicyConfig`	Configuration dataclass for RandomPolicy.
`RandomPolicy`	Policy that selects the expert action with a fixed probability.

Module Contents¶

class duo_ai.policies.random.RandomPolicyConfig[source]¶

Configuration dataclass for RandomPolicy.

Parameters:

cls (str, optional) – Name of the policy class. Default is “RandomPolicy”.
prob (float, optional) – Probability of selecting the expert action. Setting this value prevents RandomAlgorithm from conducting a grid search.
load_path (str, optional) – Path to a checkpoint to load. Default is None.

cls¶

Name of the policy class.

Type:: str

prob¶

Probability of selecting the expert action.

Type:: float or None

load_path¶

Path to a checkpoint to load.

Type:: str or None

Examples

>>> config = RandomPolicyConfig(prob=0.7)
>>> print(config.cls)
'RandomPolicy'
>>> print(config.prob)
0.7

name: str = 'random'¶

prob: float | None = None¶

load_path: str | None = None¶

class duo_ai.policies.random.RandomPolicy(config: RandomPolicyConfig, env: gym.Env)[source]¶

Bases: duo_ai.core.policy.Policy

Policy that selects the expert action with a fixed probability.

Examples

>>> policy = RandomPolicy(RandomPolicyConfig(prob=0.7), env)
>>> obs = ...
>>> action = policy.act(obs)

config_cls¶

prob¶

device¶

EXPERT¶

config¶

act(obs: object, temperature: float | None = None) → torch.Tensor[source]¶

Select actions randomly based on the configured probability.

Parameters:

obs (dict or np.ndarray) – Batch of observations. If dict, must contain ‘base_obs’.
temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of selected actions (expert or not) for the batch.

Return type:

torch.Tensor

Raises:

ValueError – If obs is not a dict or numpy array.

Examples

>>> action = policy.act(obs)

reset(done: numpy.ndarray) → None[source]¶

Reset the policy state at episode boundaries.

Parameters:: done (np.ndarray) – Boolean array indicating which episodes in a batch require a reset.
Return type:: None

Examples

>>> policy.reset(done)

set_params(params: dict) → None[source]¶

Set the parameters of the policy.

Parameters:: params (dict) – Dictionary of policy parameters to set.
Return type:: None

Examples

>>> policy.set_params({'prob': 0.5})

get_params() → dict[source]¶

Get the current parameters of the policy.

Returns:: Dictionary of policy parameters.
Return type:: dict

Examples

>>> params = policy.get_params()

train() → None[source]¶

Set the policy to training mode.

Return type:: None

Examples

>>> policy.train()

eval() → None[source]¶

Set the policy to evaluation mode.

Return type:: None

Examples

>>> policy.eval()