duo_ai.policies.random

Classes

RandomPolicyConfig

Configuration dataclass for RandomPolicy.

RandomPolicy

Policy that selects the expert action with a fixed probability.

Module Contents

class duo_ai.policies.random.RandomPolicyConfig[source]

Configuration dataclass for RandomPolicy.

Parameters:
  • cls (str, optional) – Name of the policy class. Default is “RandomPolicy”.

  • prob (float, optional) – Probability of selecting the expert action. Setting this value prevents RandomAlgorithm from conducting a grid search.

  • load_path (str, optional) – Path to a checkpoint to load. Default is None.

cls

Name of the policy class.

Type:

str

prob

Probability of selecting the expert action.

Type:

float or None

load_path

Path to a checkpoint to load.

Type:

str or None

Examples

>>> config = RandomPolicyConfig(prob=0.7)
>>> print(config.cls)
'RandomPolicy'
>>> print(config.prob)
0.7
name: str = 'random'
prob: float | None = None
load_path: str | None = None
class duo_ai.policies.random.RandomPolicy(config: RandomPolicyConfig, env: gym.Env)[source]

Bases: duo_ai.core.policy.Policy

Policy that selects the expert action with a fixed probability.

Examples

>>> policy = RandomPolicy(RandomPolicyConfig(prob=0.7), env)
>>> obs = ...
>>> action = policy.act(obs)
config_cls
prob
device
EXPERT
config
act(obs: object, temperature: float | None = None) torch.Tensor[source]

Select actions randomly based on the configured probability.

Parameters:
  • obs (dict or np.ndarray) – Batch of observations. If dict, must contain ‘base_obs’.

  • temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of selected actions (expert or not) for the batch.

Return type:

torch.Tensor

Raises:

ValueError – If obs is not a dict or numpy array.

Examples

>>> action = policy.act(obs)
reset(done: numpy.ndarray) None[source]

Reset the policy state at episode boundaries.

Parameters:

done (np.ndarray) – Boolean array indicating which episodes in a batch require a reset.

Return type:

None

Examples

>>> policy.reset(done)
set_params(params: dict) None[source]

Set the parameters of the policy.

Parameters:

params (dict) – Dictionary of policy parameters to set.

Return type:

None

Examples

>>> policy.set_params({'prob': 0.5})
get_params() dict[source]

Get the current parameters of the policy.

Returns:

Dictionary of policy parameters.

Return type:

dict

Examples

>>> params = policy.get_params()
train() None[source]

Set the policy to training mode.

Return type:

None

Examples

>>> policy.train()
eval() None[source]

Set the policy to evaluation mode.

Return type:

None

Examples

>>> policy.eval()