duo_ai.algorithms
=================

.. py:module:: duo_ai.algorithms


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/duo_ai/algorithms/always/index
   /autoapi/duo_ai/algorithms/logit/index
   /autoapi/duo_ai/algorithms/ppo/index
   /autoapi/duo_ai/algorithms/pyod/index
   /autoapi/duo_ai/algorithms/random/index


Attributes
----------

.. autoapisummary::

   duo_ai.algorithms.registry


Classes
-------

.. autoapisummary::

   duo_ai.algorithms.AlwaysAlgorithm
   duo_ai.algorithms.LogitAlgorithm
   duo_ai.algorithms.PPOAlgorithm
   duo_ai.algorithms.PyODAlgorithm
   duo_ai.algorithms.RandomAlgorithm


Package Contents
----------------

.. py:class:: AlwaysAlgorithm(config: AlwaysAlgorithmConfig)

   Bases: :py:obj:`duo_ai.core.algorithm.Algorithm`


   Algorithm that always returns the same action, regardless of input.

   .. rubric:: Examples

   >>> algo = AlwaysAlgorithm(AlwaysAlgorithmConfig())


   .. py:attribute:: config_cls


   .. py:method:: train(policy: duo.core.Policy, env: gym.Env, validators: Dict[str, duo.core.Evaluator]) -> None

      Run the AlwaysAlgorithm training procedure.

      This method evaluates the provided policy in the given environment using the specified evaluators.
      The AlwaysAlgorithm always returns the same action, regardless of the input observation.

      :param policy: The policy instance to use for generating actions.
      :type policy: duo.core.Policy
      :param env: The environment in which the policy is evaluated.
      :type env: gym.Env
      :param validators: Dictionary mapping split names to evaluator instances for evaluation.
      :type validators: dict of str to duo.core.Evaluator

      :rtype: None

      .. rubric:: Examples

      >>> algorithm = AlwaysAlgorithm(AlwaysAlgorithmConfig())
      >>> algorithm.train(policy, env, validators)


.. py:class:: LogitAlgorithm(config: LogitAlgorithmConfig)

   Bases: :py:obj:`duo_ai.core.Algorithm`


   Algorithm for tuning confidence-based policies using logit thresholds and temperatures.

   .. rubric:: Examples

   >>> algo = LogitAlgorithm(LogitAlgorithmConfig())


   .. py:attribute:: config_cls


   .. py:attribute:: config


   .. py:method:: train(policy: duo.core.Policy, env: gym.Env, validators: Dict[str, duo.core.Evaluator]) -> None

      Train the LogitAlgorithm by searching for the best threshold and temperature parameters
      based on rollout scores and evaluation results.

      :param policy: The policy to be trained and evaluated.
      :type policy: duo.core.Policy
      :param env: The environment used for training and rollouts.
      :type env: gym.Env
      :param validators: Dictionary mapping split names to evaluator instances for evaluation.
      :type validators: dict of str to duo.core.Evaluator

      :rtype: None

      .. rubric:: Examples

      >>> algorithm = LogitAlgorithm(LogitAlgorithmConfig())
      >>> algorithm.train(policy, env, validators)


   .. py:method:: save_checkpoint(policy: duo.core.Policy, name: str) -> None

      Save the current policy configuration and parameters to a checkpoint file.

      :param policy: The policy whose parameters are to be saved.
      :type policy: duo.core.Policy
      :param name: Name for the checkpoint file.
      :type name: str

      :rtype: None

      .. rubric:: Examples

      >>> self.save_checkpoint(policy, "best_test")


   .. py:method:: _generate_scores(env: gym.Env, policy: duo.core.Policy, temperature: float, num_rollouts: int) -> list

      Generate confidence scores by rolling out the policy in the environment.

      :param env: The environment used for rollouts.
      :type env: gym.Env
      :param policy: The policy to be evaluated.
      :type policy: duo.core.Policy
      :param temperature: Temperature parameter for action selection.
      :type temperature: float
      :param num_rollouts: Total number of rollout episodes to generate.
      :type num_rollouts: int

      :returns: **scores** -- List of confidence scores collected from rollouts.
      :rtype: list of float

      .. rubric:: Examples

      >>> scores = self._generate_scores(env, policy, 1.0, 128)


.. py:class:: PPOAlgorithm(config: PPOAlgorithmConfig)

   Bases: :py:obj:`duo_ai.core.Algorithm`


   Proximal Policy Optimization (PPO) algorithm implementation.

   .. rubric:: Examples

   >>> algo = PPOAlgorithm(PPOAlgorithmConfig())
   >>> algo.train(policy, env, validators)


   .. py:attribute:: config_cls


   .. py:attribute:: config


   .. py:method:: _initialize() -> None

      Initialize PPO training state, buffers, optimizer, and logging.

      :rtype: None


   .. py:method:: train(policy: duo.policies.PPOPolicy, env: gymnasium.Env, validators: Dict[str, duo.core.Evaluator]) -> None

      Train the PPO algorithm on the specified environment(s) using the provided policy.

      This method performs multiple training iterations, periodically evaluates the policy,
      logs statistics, and saves checkpoints for the best and last models.

      :param policy: The policy to be trained.
      :type policy: duo.policies.PPOPolicy
      :param env: The environment instance for training.
      :type env: gym.Env
      :param validators: Dictionary mapping split names to evaluator instances for evaluation.
      :type validators: dict of str to duo.core.Evaluator

      :rtype: None

      .. rubric:: Examples

      >>> algorithm.train(policy, env, validators)


   .. py:method:: _train_once() -> None

      Perform a single training iteration of PPO, including trajectory collection,
      advantage computation, and policy/value updates.

      :rtype: None


   .. py:method:: _update_learning_rate() -> None

      Update the learning rate for the optimizer, optionally annealing it over time.

      :rtype: None


   .. py:method:: _compute_advantages_and_returns() -> Tuple[torch.Tensor, torch.Tensor]

      Compute advantages and returns using Generalized Advantage Estimation (GAE).

      :returns: * **advantages** (*torch.Tensor*) -- Advantage estimates for each step.
                * **returns** (*torch.Tensor*) -- Computed returns for each step.

      .. rubric:: Examples

      >>> adv, ret = algo._compute_advantages_and_returns()


   .. py:method:: save_checkpoint(policy: duo.policies.PPOPolicy, name: str) -> None

      Save the current policy and optimizer state to a checkpoint file.

      :param policy: The policy to save.
      :type policy: duo.policies.PPOPolicy
      :param name: Name for the checkpoint file.
      :type name: str

      :rtype: None

      .. rubric:: Examples

      >>> algo.save_checkpoint(policy, "last")


   .. py:method:: load_checkpoint(policy: duo.policies.PPOPolicy, load_path: str) -> None

      Load policy and optimizer state from a checkpoint file.

      :param policy: The policy to load parameters into.
      :type policy: duo.policies.PPOPolicy
      :param load_path: Path to the checkpoint file.
      :type load_path: str

      :rtype: None

      .. rubric:: Examples

      >>> algo.load_checkpoint(policy, "checkpoint.ckpt")


.. py:class:: PyODAlgorithm(config: PyODAlgorithmConfig)

   Bases: :py:obj:`duo_ai.core.Algorithm`


   Algorithm for out-of-distribution (OOD) detection using PyOD models.

   .. rubric:: Examples

   >>> algo = PyODAlgorithm(PyODAlgorithmConfig())


   .. py:attribute:: config_cls


   .. py:attribute:: config


   .. py:attribute:: random


   .. py:method:: train(policy: duo.policies.PPOPolicy, env: gym.Env, validators: Dict[str, duo.core.Evaluator]) -> None

      Train the PyODAlgorithm by searching for the best threshold parameter
      that maximizes evaluation reward.

      :param policy: The policy to be evaluated and tuned.
      :type policy: duo.policies.PPOPolicy
      :param env: The environment instance for training and data generation.
      :type env: gym.Env
      :param validators: Dictionary mapping split names to evaluator instances for evaluation.
      :type validators: dict of str to duo.core.Evaluator

      :rtype: None

      .. rubric:: Examples

      >>> algorithm = PyODAlgorithm(PyODAlgorithmConfig())
      >>> algorithm.train(policy, env, validators)


   .. py:method:: save_checkpoint(policy: duo.policies.PPOPolicy, name: str) -> None

      Save the current policy configuration and parameters to a checkpoint file.

      :param policy: The policy whose parameters are to be saved.
      :type policy: duo.policies.PPOPolicy
      :param name: Name for the checkpoint file.
      :type name: str

      :rtype: None

      .. rubric:: Examples

      >>> self.save_checkpoint(policy, "best_test")


   .. py:method:: _generate_data(env: gym.Env, policy: duo.policies.PPOPolicy, temperature: float, num_rollouts: int, accept_rate: float) -> dict

      Generate data for OOD detection by rolling out the policy in the environment.

      :param env: The environment used for rollouts.
      :type env: gym.Env
      :param policy: The policy to be evaluated.
      :type policy: duo.policies.PPOPolicy
      :param temperature: Temperature parameter for action selection.
      :type temperature: float
      :param num_rollouts: Total number of rollout episodes to generate.
      :type num_rollouts: int
      :param accept_rate: Acceptance rate for sampling data during rollouts.
      :type accept_rate: float

      :returns: **data** -- Dictionary containing collected data arrays for each feature.
      :rtype: dict

      .. rubric:: Examples

      >>> data = self._generate_data(env, policy, 1.0, 128, 0.05)


.. py:class:: RandomAlgorithm(config: RandomAlgorithmConfig)

   Bases: :py:obj:`duo_ai.core.Algorithm`


   Algorithm that searches for the best probability parameter to maximize evaluation reward.

   .. rubric:: Examples

   >>> algo = RandomAlgorithm(RandomAlgorithmConfig())


   .. py:attribute:: config_cls


   .. py:attribute:: config


   .. py:method:: train(policy: duo.policies.PPOPolicy, env: gym.Env, validators: Dict[str, duo.core.Evaluator]) -> None

      Train the RandomAlgorithm by searching for the best probability parameter that maximizes evaluation reward.

      :param policy: The policy to be evaluated and tuned.
      :type policy: duo.policies.PPOPolicy
      :param env: The environment instance for training and data generation.
      :type env: gym.Env
      :param validators: Dictionary mapping split names to evaluator instances for evaluation.
      :type validators: dict of str to duo.core.Evaluator

      :rtype: None

      .. rubric:: Examples

      >>> algorithm = RandomAlgorithm(RandomAlgorithmConfig())
      >>> algorithm.train(policy, env, validators)


   .. py:method:: save_checkpoint(policy: duo.policies.PPOPolicy, name: str) -> None

      Save the current policy configuration and parameters to a checkpoint file.

      :param policy: The policy whose parameters are to be saved.
      :type policy: duo.policies.PPOPolicy
      :param name: Name for the checkpoint file.
      :type name: str

      :rtype: None

      .. rubric:: Examples

      >>> self.save_checkpoint(policy, "best_test")


.. py:data:: registry