duo_ai.policies.random
======================

.. py:module:: duo_ai.policies.random


Classes
-------

.. autoapisummary::

   duo_ai.policies.random.RandomPolicyConfig
   duo_ai.policies.random.RandomPolicy


Module Contents
---------------

.. py:class:: RandomPolicyConfig

   Configuration dataclass for RandomPolicy.

   :param cls: Name of the policy class. Default is "RandomPolicy".
   :type cls: str, optional
   :param prob: Probability of selecting the expert action. Setting this value prevents RandomAlgorithm from conducting a grid search.
   :type prob: float, optional
   :param load_path: Path to a checkpoint to load. Default is None.
   :type load_path: str, optional

   .. attribute:: cls

      Name of the policy class.

      :type: str

   .. attribute:: prob

      Probability of selecting the expert action.

      :type: float or None

   .. attribute:: load_path

      Path to a checkpoint to load.

      :type: str or None

   .. rubric:: Examples

   >>> config = RandomPolicyConfig(prob=0.7)
   >>> print(config.cls)
   'RandomPolicy'
   >>> print(config.prob)
   0.7


   .. py:attribute:: name
      :type:  str
      :value: 'random'


   .. py:attribute:: prob
      :type:  Optional[float]
      :value: None


   .. py:attribute:: load_path
      :type:  Optional[str]
      :value: None


.. py:class:: RandomPolicy(config: RandomPolicyConfig, env: gym.Env)

   Bases: :py:obj:`duo_ai.core.policy.Policy`


   Policy that selects the expert action with a fixed probability.

   .. rubric:: Examples

   >>> policy = RandomPolicy(RandomPolicyConfig(prob=0.7), env)
   >>> obs = ...
   >>> action = policy.act(obs)


   .. py:attribute:: config_cls


   .. py:attribute:: prob


   .. py:attribute:: device


   .. py:attribute:: EXPERT


   .. py:attribute:: config


   .. py:method:: act(obs: object, temperature: Optional[float] = None) -> torch.Tensor

      Select actions randomly based on the configured probability.

      :param obs: Batch of observations. If dict, must contain 'base_obs'.
      :type obs: dict or np.ndarray
      :param temperature: Unused. Included for API compatibility.
      :type temperature: float, optional

      :returns: Tensor of selected actions (expert or not) for the batch.
      :rtype: torch.Tensor

      :raises ValueError: If obs is not a dict or numpy array.

      .. rubric:: Examples

      >>> action = policy.act(obs)


   .. py:method:: reset(done: numpy.ndarray) -> None

      Reset the policy state at episode boundaries.

      :param done: Boolean array indicating which episodes in a batch require a reset.
      :type done: np.ndarray

      :rtype: None

      .. rubric:: Examples

      >>> policy.reset(done)


   .. py:method:: set_params(params: dict) -> None

      Set the parameters of the policy.

      :param params: Dictionary of policy parameters to set.
      :type params: dict

      :rtype: None

      .. rubric:: Examples

      >>> policy.set_params({'prob': 0.5})


   .. py:method:: get_params() -> dict

      Get the current parameters of the policy.

      :returns: Dictionary of policy parameters.
      :rtype: dict

      .. rubric:: Examples

      >>> params = policy.get_params()


   .. py:method:: train() -> None

      Set the policy to training mode.

      :rtype: None

      .. rubric:: Examples

      >>> policy.train()


   .. py:method:: eval() -> None

      Set the policy to evaluation mode.

      :rtype: None

      .. rubric:: Examples

      >>> policy.eval()