duo_ai.core.policy¶

Classes¶

Policy

Abstract base class for all policies in the Duo framework.

Module Contents¶

class duo_ai.core.policy.Policy[source]¶

Bases: abc.ABC

Abstract base class for all policies in the Duo framework.

This class defines the interface that all policy implementations must follow.

Examples

>>> class MyPolicy(Policy):
...     def act(self, obs):
...         return ...
...     def reset(self, done):
...         pass
...     def set_params(self, params):
...         pass
...     def get_params(self):
...         return {}
...     def train(self):
...         pass
...     def eval(self):
...         pass

abstract act(obs: Any, *args: Any, **kwargs: Any) → torch.Tensor[source]¶

Select an action based on the given observation.

Parameters:

obs (Any) – The current observation from the environment.
*args (Any) – Additional positional arguments.
**kwargs (Any) – Additional keyword arguments.

Returns:

The selected action. The format depends on the policy implementation.

Return type:

torch.Tensor

Examples

>>> action = policy.act(obs)

abstract reset(done: numpy.ndarray) → None[source]¶

Reset the internal state of the policy.

This method should be overridden by subclasses to implement any necessary logic for resetting the policy’s state to its initial configuration, such as clearing hidden states or episode-specific variables.

Parameters:: done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.
Return type:: None

Examples

>>> policy.reset(done)

abstract set_params(params: Dict[str, Any]) → None[source]¶

Set the parameters of the policy.

This method should be overridden by subclasses to update the policy’s parameters based on the provided dictionary, such as loading model weights or hyperparameters.

Parameters:: params (dict) – A dictionary containing the new parameters for the policy.
Return type:: None

Examples

>>> policy.set_params(params)

abstract get_params() → Dict[str, Any][source]¶

Returns the current parameters of the policy.

This method should be overridden by subclasses to return the relevant parameters of the policy, such as model weights or hyperparameters.

Returns:: A dictionary containing the current parameters of the policy.
Return type:: dict

Examples

>>> params = policy.get_params()

abstract train() → None[source]¶

Set the policy to training mode.

This method should be overridden by subclasses to implement any necessary logic for preparing the policy for training, such as setting dropout or batch normalization layers.

Return type:: None

Examples

>>> policy.train()

abstract eval() → None[source]¶

Set the policy to evaluation mode.

This method should be overridden by subclasses to implement any necessary logic for preparing the policy for evaluation, such as disabling dropout or batch normalization layers.

Return type:: None

Examples

>>> policy.eval()