duo_ai.algorithms.logit¶

Classes¶

`LogitAlgorithmConfig`	Configuration for the LogitAlgorithm, which tunes thresholds and temperatures for confidence-based policies.
`LogitAlgorithm`	Algorithm for tuning confidence-based policies using logit thresholds and temperatures.

class duo_ai.algorithms.logit.LogitAlgorithmConfig[source]¶

Configuration for the LogitAlgorithm, which tunes thresholds and temperatures for confidence-based policies.

Parameters:

name (str, optional) – Name of the algorithm class. Default is “logit”.
num_rollouts (int, optional) – Number of rollouts to use for score generation. Default is 128.
percentiles (list of float, optional) – List of percentiles to use for threshold selection. Default is range(0, 101, 10).
explore_temps (list of float, optional) – List of temperatures to use during exploration rollouts. Default is [1.0].
score_temps (list of float, optional) – List of temperatures to use when scoring. Default is [1.0].

Examples

>>> config = LogitAlgorithmConfig()

class duo_ai.algorithms.logit.LogitAlgorithm(config: LogitAlgorithmConfig)[source]¶

Algorithm for tuning confidence-based policies using logit thresholds and temperatures.

Examples

>>> algo = LogitAlgorithm(LogitAlgorithmConfig())

train(policy: duo.core.Policy, env: gym.Env, validators: Dict[str, duo.core.Evaluator]) → None[source]¶

Train the LogitAlgorithm by searching for the best threshold and temperature parameters based on rollout scores and evaluation results.

Parameters:

policy (duo.core.Policy) – The policy to be trained and evaluated.
env (gym.Env) – The environment used for training and rollouts.
validators (dict of str to duo.core.Evaluator) – Dictionary mapping split names to evaluator instances for evaluation.

Return type:

None

Examples

>>> algorithm = LogitAlgorithm(LogitAlgorithmConfig())
>>> algorithm.train(policy, env, validators)

save_checkpoint(policy: duo.core.Policy, name: str) → None[source]¶

Save the current policy configuration and parameters to a checkpoint file.

Parameters:

Return type:

None

Examples

>>> self.save_checkpoint(policy, "best_test")

_generate_scores(env: gym.Env, policy: duo.core.Policy, temperature: float, num_rollouts: int) → list[source]¶

Generate confidence scores by rolling out the policy in the environment.

Parameters:

Returns:

scores – List of confidence scores collected from rollouts.

Return type:

list of float

Examples

>>> scores = self._generate_scores(env, policy, 1.0, 128)