duo_ai.core.evaluator

Classes

EvaluatorConfig

Configuration for the Evaluator.

Evaluator

Evaluator for running policy evaluation on environments and summarizing results.

EvaluationSummarizer

Summarizer for evaluation statistics and logging.

Module Contents

class duo_ai.core.evaluator.EvaluatorConfig[source]

Configuration for the Evaluator.

Parameters:
  • num_episodes (int, optional) – Number of episodes to use for evaluation. Default is 256.

  • max_num_steps (int, optional) – Maximum number of steps per episode. Default is 256.

  • temperature (float, optional) – Temperature parameter for action selection. Default is 1.0.

  • log_action_id (int, optional) – The action index to track and log during evaluation. Default is CoordEnv.EXPERT.

Examples

>>> config = EvaluatorConfig(num_episodes=100, temperature=0.5)
num_episodes: int = 256
max_num_steps: int = 256
temperature: float = 1.0
log_action_id: int = 1
class duo_ai.core.evaluator.Evaluator(config: EvaluatorConfig, env: gym.Env)[source]

Evaluator for running policy evaluation on environments and summarizing results.

Examples

>>> evaluator = Evaluator(EvaluatorConfig(), env)
>>> summary = evaluator.evaluate(policy)
config_cls
config
env
evaluate(policy: duo_ai.core.Policy, num_episodes: int | None = None) Dict[str, Any][source]

Evaluate a policy on the environment and summarize the results.

Parameters:
  • policy (duo.core.Policy) – The policy to evaluate. Must implement an act method and have a .model attribute.

  • num_episodes (int, optional) – Number of episodes to run. If None, uses value from config.

Returns:

A dictionary mapping split names to summary statistics for each evaluation.

Return type:

dict

Examples

>>> summary = evaluator.evaluate(policy, num_episodes=100)
>>> print(summary['reward_mean'])
_eval_one_iteration(policy: duo_ai.core.Policy, env: gym.Env) None[source]

Run a single evaluation iteration for the policy on the environment.

Parameters:
  • policy (duo.core.Policy) – The policy to evaluate.

  • env (gym.Env) – The environment instance to evaluate on.

Return type:

None

class duo_ai.core.evaluator.EvaluationSummarizer(config: EvaluatorConfig)[source]

Summarizer for evaluation statistics and logging.

Examples

>>> summarizer = EvaluationSummarizer(EvaluatorConfig())
log_action_id
clear() None[source]

Clear the summary statistics log.

Return type:

None

initialize_episode(env: gym.Env) None[source]

Initialize logging for a new evaluation episode.

Parameters:

env (gym.Env) – The environment instance for the episode.

Return type:

None

finalize_episode() None[source]

Finalize and aggregate statistics for the episode.

Return type:

None

add_episode_step(env: gym.Env, action: torch.Tensor, reward: numpy.ndarray, info: List[Dict[str, Any]], has_done: numpy.ndarray) None[source]

Log statistics for each episode step.

Parameters:
  • env (gym.Env) – The environment instance.

  • action (torch.Tensor) – Actions taken at this step.

  • reward (np.ndarray) – Rewards received at this step.

  • info (list of dict) – Additional info for each environment.

  • has_done (np.ndarray) – Boolean array indicating which episodes are done.

Return type:

None

summarize() Dict[str, Any][source]

Compute summary statistics for the current log.

Returns:

Dictionary of summary statistics.

Return type:

dict

Examples

>>> summary = summarizer.summarize()
write(summary: Dict[str, Any] | None = None) Dict[str, Any][source]

Pretty-print and log the summary statistics.

Parameters:

summary (dict, optional) – Precomputed summary statistics. If None, will compute from log.

Returns:

The summary statistics that were logged.

Return type:

dict

Examples

>>> logged_summary = summarizer.write()