duo_ai.core.evaluator ===================== .. py:module:: duo_ai.core.evaluator Classes ------- .. autoapisummary:: duo_ai.core.evaluator.EvaluatorConfig duo_ai.core.evaluator.Evaluator duo_ai.core.evaluator.EvaluationSummarizer Module Contents --------------- .. py:class:: EvaluatorConfig Configuration for the Evaluator. :param num_episodes: Number of episodes to use for evaluation. Default is 256. :type num_episodes: int, optional :param max_num_steps: Maximum number of steps per episode. Default is 256. :type max_num_steps: int, optional :param temperature: Temperature parameter for action selection. Default is 1.0. :type temperature: float, optional :param log_action_id: The action index to track and log during evaluation. Default is CoordEnv.EXPERT. :type log_action_id: int, optional .. rubric:: Examples >>> config = EvaluatorConfig(num_episodes=100, temperature=0.5) .. py:attribute:: num_episodes :type: int :value: 256 .. py:attribute:: max_num_steps :type: int :value: 256 .. py:attribute:: temperature :type: float :value: 1.0 .. py:attribute:: log_action_id :type: int :value: 1 .. py:class:: Evaluator(config: EvaluatorConfig, env: gym.Env) Evaluator for running policy evaluation on environments and summarizing results. .. rubric:: Examples >>> evaluator = Evaluator(EvaluatorConfig(), env) >>> summary = evaluator.evaluate(policy) .. py:attribute:: config_cls .. py:attribute:: config .. py:attribute:: env .. py:method:: evaluate(policy: duo_ai.core.Policy, num_episodes: Optional[int] = None) -> Dict[str, Any] Evaluate a policy on the environment and summarize the results. :param policy: The policy to evaluate. Must implement an `act` method and have a `.model` attribute. :type policy: duo.core.Policy :param num_episodes: Number of episodes to run. If None, uses value from config. :type num_episodes: int, optional :returns: A dictionary mapping split names to summary statistics for each evaluation. :rtype: dict .. rubric:: Examples >>> summary = evaluator.evaluate(policy, num_episodes=100) >>> print(summary['reward_mean']) .. py:method:: _eval_one_iteration(policy: duo_ai.core.Policy, env: gym.Env) -> None Run a single evaluation iteration for the policy on the environment. :param policy: The policy to evaluate. :type policy: duo.core.Policy :param env: The environment instance to evaluate on. :type env: gym.Env :rtype: None .. py:class:: EvaluationSummarizer(config: EvaluatorConfig) Summarizer for evaluation statistics and logging. .. rubric:: Examples >>> summarizer = EvaluationSummarizer(EvaluatorConfig()) .. py:attribute:: log_action_id .. py:method:: clear() -> None Clear the summary statistics log. :rtype: None .. py:method:: initialize_episode(env: gym.Env) -> None Initialize logging for a new evaluation episode. :param env: The environment instance for the episode. :type env: gym.Env :rtype: None .. py:method:: finalize_episode() -> None Finalize and aggregate statistics for the episode. :rtype: None .. py:method:: add_episode_step(env: gym.Env, action: torch.Tensor, reward: numpy.ndarray, info: List[Dict[str, Any]], has_done: numpy.ndarray) -> None Log statistics for each episode step. :param env: The environment instance. :type env: gym.Env :param action: Actions taken at this step. :type action: torch.Tensor :param reward: Rewards received at this step. :type reward: np.ndarray :param info: Additional info for each environment. :type info: list of dict :param has_done: Boolean array indicating which episodes are done. :type has_done: np.ndarray :rtype: None .. py:method:: summarize() -> Dict[str, Any] Compute summary statistics for the current log. :returns: Dictionary of summary statistics. :rtype: dict .. rubric:: Examples >>> summary = summarizer.summarize() .. py:method:: write(summary: Optional[Dict[str, Any]] = None) -> Dict[str, Any] Pretty-print and log the summary statistics. :param summary: Precomputed summary statistics. If None, will compute from log. :type summary: dict, optional :returns: The summary statistics that were logged. :rtype: dict .. rubric:: Examples >>> logged_summary = summarizer.write()