duo_ai.policies.random¶
Classes¶
Configuration dataclass for RandomPolicy. |
|
Policy that selects the expert action with a fixed probability. |
Module Contents¶
- class duo_ai.policies.random.RandomPolicyConfig[source]¶
Configuration dataclass for RandomPolicy.
- Parameters:
cls (str, optional) – Name of the policy class. Default is “RandomPolicy”.
prob (float, optional) – Probability of selecting the expert action. Setting this value prevents RandomAlgorithm from conducting a grid search.
load_path (str, optional) – Path to a checkpoint to load. Default is None.
- cls¶
Name of the policy class.
- Type:
str
- prob¶
Probability of selecting the expert action.
- Type:
float or None
- load_path¶
Path to a checkpoint to load.
- Type:
str or None
Examples
>>> config = RandomPolicyConfig(prob=0.7) >>> print(config.cls) 'RandomPolicy' >>> print(config.prob) 0.7
- name: str = 'random'¶
- prob: float | None = None¶
- load_path: str | None = None¶
- class duo_ai.policies.random.RandomPolicy(config: RandomPolicyConfig, env: gym.Env)[source]¶
Bases:
duo_ai.core.policy.PolicyPolicy that selects the expert action with a fixed probability.
Examples
>>> policy = RandomPolicy(RandomPolicyConfig(prob=0.7), env) >>> obs = ... >>> action = policy.act(obs)
- config_cls¶
- prob¶
- device¶
- EXPERT¶
- config¶
- act(obs: object, temperature: float | None = None) torch.Tensor[source]¶
Select actions randomly based on the configured probability.
- Parameters:
obs (dict or np.ndarray) – Batch of observations. If dict, must contain ‘base_obs’.
temperature (float, optional) – Unused. Included for API compatibility.
- Returns:
Tensor of selected actions (expert or not) for the batch.
- Return type:
torch.Tensor
- Raises:
ValueError – If obs is not a dict or numpy array.
Examples
>>> action = policy.act(obs)
- reset(done: numpy.ndarray) None[source]¶
Reset the policy state at episode boundaries.
- Parameters:
done (np.ndarray) – Boolean array indicating which episodes in a batch require a reset.
- Return type:
None
Examples
>>> policy.reset(done)
- set_params(params: dict) None[source]¶
Set the parameters of the policy.
- Parameters:
params (dict) – Dictionary of policy parameters to set.
- Return type:
None
Examples
>>> policy.set_params({'prob': 0.5})
- get_params() dict[source]¶
Get the current parameters of the policy.
- Returns:
Dictionary of policy parameters.
- Return type:
dict
Examples
>>> params = policy.get_params()