duo_ai.core.evaluator¶
Classes¶
Configuration for the Evaluator. |
|
Evaluator for running policy evaluation on environments and summarizing results. |
|
Summarizer for evaluation statistics and logging. |
Module Contents¶
- class duo_ai.core.evaluator.EvaluatorConfig[source]¶
Configuration for the Evaluator.
- Parameters:
num_episodes (int, optional) – Number of episodes to use for evaluation. Default is 256.
max_num_steps (int, optional) – Maximum number of steps per episode. Default is 256.
temperature (float, optional) – Temperature parameter for action selection. Default is 1.0.
log_action_id (int, optional) – The action index to track and log during evaluation. Default is CoordEnv.EXPERT.
Examples
>>> config = EvaluatorConfig(num_episodes=100, temperature=0.5)
- num_episodes: int = 256¶
- max_num_steps: int = 256¶
- temperature: float = 1.0¶
- log_action_id: int = 1¶
- class duo_ai.core.evaluator.Evaluator(config: EvaluatorConfig, env: gym.Env)[source]¶
Evaluator for running policy evaluation on environments and summarizing results.
Examples
>>> evaluator = Evaluator(EvaluatorConfig(), env) >>> summary = evaluator.evaluate(policy)
- config_cls¶
- config¶
- env¶
- evaluate(policy: duo_ai.core.Policy, num_episodes: int | None = None) Dict[str, Any][source]¶
Evaluate a policy on the environment and summarize the results.
- Parameters:
policy (duo.core.Policy) – The policy to evaluate. Must implement an act method and have a .model attribute.
num_episodes (int, optional) – Number of episodes to run. If None, uses value from config.
- Returns:
A dictionary mapping split names to summary statistics for each evaluation.
- Return type:
dict
Examples
>>> summary = evaluator.evaluate(policy, num_episodes=100) >>> print(summary['reward_mean'])
- _eval_one_iteration(policy: duo_ai.core.Policy, env: gym.Env) None[source]¶
Run a single evaluation iteration for the policy on the environment.
- Parameters:
policy (duo.core.Policy) – The policy to evaluate.
env (gym.Env) – The environment instance to evaluate on.
- Return type:
None
- class duo_ai.core.evaluator.EvaluationSummarizer(config: EvaluatorConfig)[source]¶
Summarizer for evaluation statistics and logging.
Examples
>>> summarizer = EvaluationSummarizer(EvaluatorConfig())
- log_action_id¶
- initialize_episode(env: gym.Env) None[source]¶
Initialize logging for a new evaluation episode.
- Parameters:
env (gym.Env) – The environment instance for the episode.
- Return type:
None
- finalize_episode() None[source]¶
Finalize and aggregate statistics for the episode.
- Return type:
None
- add_episode_step(env: gym.Env, action: torch.Tensor, reward: numpy.ndarray, info: List[Dict[str, Any]], has_done: numpy.ndarray) None[source]¶
Log statistics for each episode step.
- Parameters:
env (gym.Env) – The environment instance.
action (torch.Tensor) – Actions taken at this step.
reward (np.ndarray) – Rewards received at this step.
info (list of dict) – Additional info for each environment.
has_done (np.ndarray) – Boolean array indicating which episodes are done.
- Return type:
None
- summarize() Dict[str, Any][source]¶
Compute summary statistics for the current log.
- Returns:
Dictionary of summary statistics.
- Return type:
dict
Examples
>>> summary = summarizer.summarize()
- write(summary: Dict[str, Any] | None = None) Dict[str, Any][source]¶
Pretty-print and log the summary statistics.
- Parameters:
summary (dict, optional) – Precomputed summary statistics. If None, will compute from log.
- Returns:
The summary statistics that were logged.
- Return type:
dict
Examples
>>> logged_summary = summarizer.write()