duo_ai.policies.random ====================== .. py:module:: duo_ai.policies.random Classes ------- .. autoapisummary:: duo_ai.policies.random.RandomPolicyConfig duo_ai.policies.random.RandomPolicy Module Contents --------------- .. py:class:: RandomPolicyConfig Configuration dataclass for RandomPolicy. :param cls: Name of the policy class. Default is "RandomPolicy". :type cls: str, optional :param prob: Probability of selecting the expert action. Setting this value prevents RandomAlgorithm from conducting a grid search. :type prob: float, optional :param load_path: Path to a checkpoint to load. Default is None. :type load_path: str, optional .. attribute:: cls Name of the policy class. :type: str .. attribute:: prob Probability of selecting the expert action. :type: float or None .. attribute:: load_path Path to a checkpoint to load. :type: str or None .. rubric:: Examples >>> config = RandomPolicyConfig(prob=0.7) >>> print(config.cls) 'RandomPolicy' >>> print(config.prob) 0.7 .. py:attribute:: name :type: str :value: 'random' .. py:attribute:: prob :type: Optional[float] :value: None .. py:attribute:: load_path :type: Optional[str] :value: None .. py:class:: RandomPolicy(config: RandomPolicyConfig, env: gym.Env) Bases: :py:obj:`duo_ai.core.policy.Policy` Policy that selects the expert action with a fixed probability. .. rubric:: Examples >>> policy = RandomPolicy(RandomPolicyConfig(prob=0.7), env) >>> obs = ... >>> action = policy.act(obs) .. py:attribute:: config_cls .. py:attribute:: prob .. py:attribute:: device .. py:attribute:: EXPERT .. py:attribute:: config .. py:method:: act(obs: object, temperature: Optional[float] = None) -> torch.Tensor Select actions randomly based on the configured probability. :param obs: Batch of observations. If dict, must contain 'base_obs'. :type obs: dict or np.ndarray :param temperature: Unused. Included for API compatibility. :type temperature: float, optional :returns: Tensor of selected actions (expert or not) for the batch. :rtype: torch.Tensor :raises ValueError: If obs is not a dict or numpy array. .. rubric:: Examples >>> action = policy.act(obs) .. py:method:: reset(done: numpy.ndarray) -> None Reset the policy state at episode boundaries. :param done: Boolean array indicating which episodes in a batch require a reset. :type done: np.ndarray :rtype: None .. rubric:: Examples >>> policy.reset(done) .. py:method:: set_params(params: dict) -> None Set the parameters of the policy. :param params: Dictionary of policy parameters to set. :type params: dict :rtype: None .. rubric:: Examples >>> policy.set_params({'prob': 0.5}) .. py:method:: get_params() -> dict Get the current parameters of the policy. :returns: Dictionary of policy parameters. :rtype: dict .. rubric:: Examples >>> params = policy.get_params() .. py:method:: train() -> None Set the policy to training mode. :rtype: None .. rubric:: Examples >>> policy.train() .. py:method:: eval() -> None Set the policy to evaluation mode. :rtype: None .. rubric:: Examples >>> policy.eval()