duo_ai.algorithms ================= .. py:module:: duo_ai.algorithms Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/duo_ai/algorithms/always/index /autoapi/duo_ai/algorithms/logit/index /autoapi/duo_ai/algorithms/ppo/index /autoapi/duo_ai/algorithms/pyod/index /autoapi/duo_ai/algorithms/random/index Attributes ---------- .. autoapisummary:: duo_ai.algorithms.registry Classes ------- .. autoapisummary:: duo_ai.algorithms.AlwaysAlgorithm duo_ai.algorithms.LogitAlgorithm duo_ai.algorithms.PPOAlgorithm duo_ai.algorithms.PyODAlgorithm duo_ai.algorithms.RandomAlgorithm Package Contents ---------------- .. py:class:: AlwaysAlgorithm(config: AlwaysAlgorithmConfig) Bases: :py:obj:`duo_ai.core.algorithm.Algorithm` Algorithm that always returns the same action, regardless of input. .. rubric:: Examples >>> algo = AlwaysAlgorithm(AlwaysAlgorithmConfig()) .. py:attribute:: config_cls .. py:method:: train(policy: duo.core.Policy, env: gym.Env, validators: Dict[str, duo.core.Evaluator]) -> None Run the AlwaysAlgorithm training procedure. This method evaluates the provided policy in the given environment using the specified evaluators. The AlwaysAlgorithm always returns the same action, regardless of the input observation. :param policy: The policy instance to use for generating actions. :type policy: duo.core.Policy :param env: The environment in which the policy is evaluated. :type env: gym.Env :param validators: Dictionary mapping split names to evaluator instances for evaluation. :type validators: dict of str to duo.core.Evaluator :rtype: None .. rubric:: Examples >>> algorithm = AlwaysAlgorithm(AlwaysAlgorithmConfig()) >>> algorithm.train(policy, env, validators) .. py:class:: LogitAlgorithm(config: LogitAlgorithmConfig) Bases: :py:obj:`duo_ai.core.Algorithm` Algorithm for tuning confidence-based policies using logit thresholds and temperatures. .. rubric:: Examples >>> algo = LogitAlgorithm(LogitAlgorithmConfig()) .. py:attribute:: config_cls .. py:attribute:: config .. py:method:: train(policy: duo.core.Policy, env: gym.Env, validators: Dict[str, duo.core.Evaluator]) -> None Train the LogitAlgorithm by searching for the best threshold and temperature parameters based on rollout scores and evaluation results. :param policy: The policy to be trained and evaluated. :type policy: duo.core.Policy :param env: The environment used for training and rollouts. :type env: gym.Env :param validators: Dictionary mapping split names to evaluator instances for evaluation. :type validators: dict of str to duo.core.Evaluator :rtype: None .. rubric:: Examples >>> algorithm = LogitAlgorithm(LogitAlgorithmConfig()) >>> algorithm.train(policy, env, validators) .. py:method:: save_checkpoint(policy: duo.core.Policy, name: str) -> None Save the current policy configuration and parameters to a checkpoint file. :param policy: The policy whose parameters are to be saved. :type policy: duo.core.Policy :param name: Name for the checkpoint file. :type name: str :rtype: None .. rubric:: Examples >>> self.save_checkpoint(policy, "best_test") .. py:method:: _generate_scores(env: gym.Env, policy: duo.core.Policy, temperature: float, num_rollouts: int) -> list Generate confidence scores by rolling out the policy in the environment. :param env: The environment used for rollouts. :type env: gym.Env :param policy: The policy to be evaluated. :type policy: duo.core.Policy :param temperature: Temperature parameter for action selection. :type temperature: float :param num_rollouts: Total number of rollout episodes to generate. :type num_rollouts: int :returns: **scores** -- List of confidence scores collected from rollouts. :rtype: list of float .. rubric:: Examples >>> scores = self._generate_scores(env, policy, 1.0, 128) .. py:class:: PPOAlgorithm(config: PPOAlgorithmConfig) Bases: :py:obj:`duo_ai.core.Algorithm` Proximal Policy Optimization (PPO) algorithm implementation. .. rubric:: Examples >>> algo = PPOAlgorithm(PPOAlgorithmConfig()) >>> algo.train(policy, env, validators) .. py:attribute:: config_cls .. py:attribute:: config .. py:method:: _initialize() -> None Initialize PPO training state, buffers, optimizer, and logging. :rtype: None .. py:method:: train(policy: duo.policies.PPOPolicy, env: gymnasium.Env, validators: Dict[str, duo.core.Evaluator]) -> None Train the PPO algorithm on the specified environment(s) using the provided policy. This method performs multiple training iterations, periodically evaluates the policy, logs statistics, and saves checkpoints for the best and last models. :param policy: The policy to be trained. :type policy: duo.policies.PPOPolicy :param env: The environment instance for training. :type env: gym.Env :param validators: Dictionary mapping split names to evaluator instances for evaluation. :type validators: dict of str to duo.core.Evaluator :rtype: None .. rubric:: Examples >>> algorithm.train(policy, env, validators) .. py:method:: _train_once() -> None Perform a single training iteration of PPO, including trajectory collection, advantage computation, and policy/value updates. :rtype: None .. py:method:: _update_learning_rate() -> None Update the learning rate for the optimizer, optionally annealing it over time. :rtype: None .. py:method:: _compute_advantages_and_returns() -> Tuple[torch.Tensor, torch.Tensor] Compute advantages and returns using Generalized Advantage Estimation (GAE). :returns: * **advantages** (*torch.Tensor*) -- Advantage estimates for each step. * **returns** (*torch.Tensor*) -- Computed returns for each step. .. rubric:: Examples >>> adv, ret = algo._compute_advantages_and_returns() .. py:method:: save_checkpoint(policy: duo.policies.PPOPolicy, name: str) -> None Save the current policy and optimizer state to a checkpoint file. :param policy: The policy to save. :type policy: duo.policies.PPOPolicy :param name: Name for the checkpoint file. :type name: str :rtype: None .. rubric:: Examples >>> algo.save_checkpoint(policy, "last") .. py:method:: load_checkpoint(policy: duo.policies.PPOPolicy, load_path: str) -> None Load policy and optimizer state from a checkpoint file. :param policy: The policy to load parameters into. :type policy: duo.policies.PPOPolicy :param load_path: Path to the checkpoint file. :type load_path: str :rtype: None .. rubric:: Examples >>> algo.load_checkpoint(policy, "checkpoint.ckpt") .. py:class:: PyODAlgorithm(config: PyODAlgorithmConfig) Bases: :py:obj:`duo_ai.core.Algorithm` Algorithm for out-of-distribution (OOD) detection using PyOD models. .. rubric:: Examples >>> algo = PyODAlgorithm(PyODAlgorithmConfig()) .. py:attribute:: config_cls .. py:attribute:: config .. py:attribute:: random .. py:method:: train(policy: duo.policies.PPOPolicy, env: gym.Env, validators: Dict[str, duo.core.Evaluator]) -> None Train the PyODAlgorithm by searching for the best threshold parameter that maximizes evaluation reward. :param policy: The policy to be evaluated and tuned. :type policy: duo.policies.PPOPolicy :param env: The environment instance for training and data generation. :type env: gym.Env :param validators: Dictionary mapping split names to evaluator instances for evaluation. :type validators: dict of str to duo.core.Evaluator :rtype: None .. rubric:: Examples >>> algorithm = PyODAlgorithm(PyODAlgorithmConfig()) >>> algorithm.train(policy, env, validators) .. py:method:: save_checkpoint(policy: duo.policies.PPOPolicy, name: str) -> None Save the current policy configuration and parameters to a checkpoint file. :param policy: The policy whose parameters are to be saved. :type policy: duo.policies.PPOPolicy :param name: Name for the checkpoint file. :type name: str :rtype: None .. rubric:: Examples >>> self.save_checkpoint(policy, "best_test") .. py:method:: _generate_data(env: gym.Env, policy: duo.policies.PPOPolicy, temperature: float, num_rollouts: int, accept_rate: float) -> dict Generate data for OOD detection by rolling out the policy in the environment. :param env: The environment used for rollouts. :type env: gym.Env :param policy: The policy to be evaluated. :type policy: duo.policies.PPOPolicy :param temperature: Temperature parameter for action selection. :type temperature: float :param num_rollouts: Total number of rollout episodes to generate. :type num_rollouts: int :param accept_rate: Acceptance rate for sampling data during rollouts. :type accept_rate: float :returns: **data** -- Dictionary containing collected data arrays for each feature. :rtype: dict .. rubric:: Examples >>> data = self._generate_data(env, policy, 1.0, 128, 0.05) .. py:class:: RandomAlgorithm(config: RandomAlgorithmConfig) Bases: :py:obj:`duo_ai.core.Algorithm` Algorithm that searches for the best probability parameter to maximize evaluation reward. .. rubric:: Examples >>> algo = RandomAlgorithm(RandomAlgorithmConfig()) .. py:attribute:: config_cls .. py:attribute:: config .. py:method:: train(policy: duo.policies.PPOPolicy, env: gym.Env, validators: Dict[str, duo.core.Evaluator]) -> None Train the RandomAlgorithm by searching for the best probability parameter that maximizes evaluation reward. :param policy: The policy to be evaluated and tuned. :type policy: duo.policies.PPOPolicy :param env: The environment instance for training and data generation. :type env: gym.Env :param validators: Dictionary mapping split names to evaluator instances for evaluation. :type validators: dict of str to duo.core.Evaluator :rtype: None .. rubric:: Examples >>> algorithm = RandomAlgorithm(RandomAlgorithmConfig()) >>> algorithm.train(policy, env, validators) .. py:method:: save_checkpoint(policy: duo.policies.PPOPolicy, name: str) -> None Save the current policy configuration and parameters to a checkpoint file. :param policy: The policy whose parameters are to be saved. :type policy: duo.policies.PPOPolicy :param name: Name for the checkpoint file. :type name: str :rtype: None .. rubric:: Examples >>> self.save_checkpoint(policy, "best_test") .. py:data:: registry