duo_ai.algorithms.logit ======================= .. py:module:: duo_ai.algorithms.logit Classes ------- .. autoapisummary:: duo_ai.algorithms.logit.LogitAlgorithmConfig duo_ai.algorithms.logit.LogitAlgorithm Module Contents --------------- .. py:class:: LogitAlgorithmConfig Configuration for the LogitAlgorithm, which tunes thresholds and temperatures for confidence-based policies. :param name: Name of the algorithm class. Default is "logit". :type name: str, optional :param num_rollouts: Number of rollouts to use for score generation. Default is 128. :type num_rollouts: int, optional :param percentiles: List of percentiles to use for threshold selection. Default is range(0, 101, 10). :type percentiles: list of float, optional :param explore_temps: List of temperatures to use during exploration rollouts. Default is [1.0]. :type explore_temps: list of float, optional :param score_temps: List of temperatures to use when scoring. Default is [1.0]. :type score_temps: list of float, optional .. rubric:: Examples >>> config = LogitAlgorithmConfig() .. py:attribute:: name :type: str :value: 'logit' .. py:attribute:: num_rollouts :type: int :value: 128 .. py:attribute:: percentiles :type: List[float] .. py:attribute:: explore_temps :type: List[float] :value: [1.0] .. py:attribute:: score_temps :type: List[float] :value: [1.0] .. py:class:: LogitAlgorithm(config: LogitAlgorithmConfig) Bases: :py:obj:`duo_ai.core.Algorithm` Algorithm for tuning confidence-based policies using logit thresholds and temperatures. .. rubric:: Examples >>> algo = LogitAlgorithm(LogitAlgorithmConfig()) .. py:attribute:: config_cls .. py:attribute:: config .. py:method:: train(policy: duo.core.Policy, env: gym.Env, validators: Dict[str, duo.core.Evaluator]) -> None Train the LogitAlgorithm by searching for the best threshold and temperature parameters based on rollout scores and evaluation results. :param policy: The policy to be trained and evaluated. :type policy: duo.core.Policy :param env: The environment used for training and rollouts. :type env: gym.Env :param validators: Dictionary mapping split names to evaluator instances for evaluation. :type validators: dict of str to duo.core.Evaluator :rtype: None .. rubric:: Examples >>> algorithm = LogitAlgorithm(LogitAlgorithmConfig()) >>> algorithm.train(policy, env, validators) .. py:method:: save_checkpoint(policy: duo.core.Policy, name: str) -> None Save the current policy configuration and parameters to a checkpoint file. :param policy: The policy whose parameters are to be saved. :type policy: duo.core.Policy :param name: Name for the checkpoint file. :type name: str :rtype: None .. rubric:: Examples >>> self.save_checkpoint(policy, "best_test") .. py:method:: _generate_scores(env: gym.Env, policy: duo.core.Policy, temperature: float, num_rollouts: int) -> list Generate confidence scores by rolling out the policy in the environment. :param env: The environment used for rollouts. :type env: gym.Env :param policy: The policy to be evaluated. :type policy: duo.core.Policy :param temperature: Temperature parameter for action selection. :type temperature: float :param num_rollouts: Total number of rollout episodes to generate. :type num_rollouts: int :returns: **scores** -- List of confidence scores collected from rollouts. :rtype: list of float .. rubric:: Examples >>> scores = self._generate_scores(env, policy, 1.0, 128)