duo_ai ====== .. py:module:: duo_ai Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/duo_ai/algorithms/index /autoapi/duo_ai/core/index /autoapi/duo_ai/environments/index /autoapi/duo_ai/models/index /autoapi/duo_ai/policies/index /autoapi/duo_ai/utils/index Attributes ---------- .. autoapisummary:: duo_ai.__version__ Classes ------- .. autoapisummary:: duo_ai.MasterConfig duo_ai.CoordEnv duo_ai.GeneralCoordEnv duo_ai.Evaluator Functions --------- .. autoapisummary:: duo_ai.configure duo_ai.get_global_variable duo_ai.make_config duo_ai.make_algorithm duo_ai.make_policy duo_ai.load_policy duo_ai.register_environment duo_ai.register_algorithm duo_ai.register_policy duo_ai.register_model Package Contents ---------------- .. py:class:: MasterConfig Main configuration class for the Duo framework. This class holds all experiment-level configuration, including environment, policy, algorithm, evaluation, and coordination settings. :param name: Name of the experiment. Default is "default". :type name: str, optional :param device: Device index for CUDA. Default is 0. :type device: int, optional :param seed: Random seed for reproducibility. Default is 10. :type seed: int, optional :param env: Environment configuration or name. Default is "procgen". :type env: Any, optional :param policy: Policy configuration or name. Default is "PPOPolicy". :type policy: Any, optional :param algorithm: Algorithm configuration or name. Default is "PPOAlgorithm". :type algorithm: Any, optional :param evaluation: Evaluation configuration. Default is None. :type evaluation: Any, optional :param eval_name: Name for evaluation run. Default is None. :type eval_name: str, optional :param overwrite: Whether to overwrite existing experiment directory. Default is False. :type overwrite: bool, optional :param use_wandb: Whether to use Weights & Biases logging. Default is False. :type use_wandb: bool, optional :param experiment_dir: Path to the experiment directory. Default is "". :type experiment_dir: str, optional :param train_novice: Path to novice training checkpoint. Default is None. :type train_novice: str, optional :param train_expert: Path to expert training checkpoint. Default is None. :type train_expert: str, optional :param test_novice: Path to novice test checkpoint. Default is None. :type test_novice: str, optional :param test_expert: Path to expert test checkpoint. Default is None. :type test_expert: str, optional :param coordination: Coordination configuration. Default is None. :type coordination: Any, optional .. rubric:: Examples >>> config = MasterConfig(name="my_experiment", env="procgen", policy="PPOPolicy") .. py:attribute:: name :type: str :value: 'default' .. py:attribute:: device :type: int :value: 0 .. py:attribute:: seed :type: int :value: 10 .. py:attribute:: env :type: Any :value: 'procgen' .. py:attribute:: policy :type: Any :value: 'PPOPolicy' .. py:attribute:: algorithm :type: Any :value: 'PPOAlgorithm' .. py:attribute:: evaluation :type: Any :value: None .. py:attribute:: eval_mode :type: Optional[int] :value: None .. py:attribute:: eval_name :type: Optional[str] :value: None .. py:attribute:: overwrite :type: bool :value: False .. py:attribute:: use_wandb :type: bool :value: False .. py:attribute:: experiment_dir :type: str :value: '' .. py:attribute:: train_novice :type: Optional[str] :value: None .. py:attribute:: train_expert :type: Optional[str] :value: None .. py:attribute:: test_novice :type: Optional[str] :value: None .. py:attribute:: test_expert :type: Optional[str] :value: None .. py:attribute:: coordination :type: Any :value: None .. py:method:: __post_init__() -> None Post-initialization logic for MasterConfig. Converts string or dictionary fields for env, policy, algorithm, evaluation, and coordination into their respective configuration objects. :raises IndexError: If required keys are missing in configuration dictionaries. :raises ValueError: If configuration fields are not of expected types. .. rubric:: Examples >>> config = MasterConfig(env={"name": "procgen"}) >>> config.__post_init__() .. py:function:: configure(config: MasterConfig) -> None Set up experiment directory, logging, random seeds, and global variables for the experiment. :param config: The experiment configuration object. :type config: MasterConfig :rtype: None :raises FileExistsError: If the experiment directory exists and overwrite is not set. .. rubric:: Examples >>> configure(config) .. py:class:: CoordEnv(config: CoordinationConfig, base_env: gymnasium.Env, novice: duo.core.Policy, expert: duo.core.Policy, open_novice: bool = True, open_expert: bool = False) Bases: :py:obj:`gymnasium.Env` Environment for coordinating between novice and expert policies. This class wraps a base environment and enables switching between a novice and expert policy, applying costs for expert queries and agent switching. .. rubric:: Examples >>> config = CoordinationConfig() >>> base_env = gym.make(...) >>> novice = ... >>> expert = ... >>> env = CoordEnv(config, base_env, novice, expert) .. py:attribute:: config_cls .. py:attribute:: NOVICE :value: 0 .. py:attribute:: EXPERT :value: 1 .. py:attribute:: config .. py:attribute:: base_env .. py:attribute:: novice .. py:attribute:: expert .. py:attribute:: open_novice :value: True .. py:attribute:: open_expert :value: False .. py:attribute:: action_space .. py:attribute:: observation_space .. py:attribute:: expert_query_cost_per_action :value: None .. py:attribute:: switch_agent_cost_per_action :value: None .. py:property:: num_envs :type: int Number of parallel environments. :returns: Number of parallel environments. :rtype: int .. rubric:: Examples >>> n = env.num_envs .. py:method:: set_costs(base_penalty: float) -> None Set the cost per action for expert queries and agent switching. :param base_penalty: The reward value per action. :type base_penalty: float :rtype: None .. rubric:: Examples >>> env.set_costs(0.05) .. py:method:: reset() -> Dict[str, Any] Reset the coordination environment to an initial state. :returns: The initial observation of the environment, including: - "base_obs": The initial observation from the base environment. - "novice_hidden": Numpy array of hidden features from the novice policy. - "novice_logits": Numpy array of output logits from the novice policy. - "expert_hidden": Numpy array of hidden features from the expert policy (if open_expert). - "expert_logits": Numpy array of output logits from the expert policy (if open_expert). :rtype: dict .. rubric:: Examples >>> obs = env.reset() .. py:method:: _reset_agents(done: numpy.ndarray) -> None Reset the internal state of the novice and expert agents. :param done: Boolean array indicating which episodes in a batch require a reset. :type done: numpy.ndarray :rtype: None .. rubric:: Examples >>> env._reset_agents(np.array([True, False])) .. py:method:: step(action: numpy.ndarray) -> Tuple[Dict[str, Any], numpy.ndarray, numpy.ndarray, List[Dict[str, Any]]] Advance the environment by one step using the provided action. :param action: The action(s) to take in the environment. Should be a numpy array indicating which agent acts. :type action: numpy.ndarray :returns: * **obs** (*dict*) -- The next observation of the environment, including: - "base_obs": The observation from the base environment. - "novice_hidden": Numpy array of hidden features from the novice policy. - "novice_logits": Numpy array of output logits from the novice policy. - "expert_hidden": Numpy array of hidden features from the expert policy (if open_expert). - "expert_logits": Numpy array of output logits from the expert policy (if open_expert). * **reward** (*numpy.ndarray*) -- The reward(s) obtained from the environment after taking the action. * **done** (*numpy.ndarray*) -- Boolean flag(s) indicating whether the episode has ended for each environment. * **info** (*list of dict*) -- Additional information from the environment for each agent or environment instance. :raises Exception: Propagates any exceptions raised by the underlying environment's `step` method. .. rubric:: Examples >>> obs, reward, done, info = env.step(action) .. py:method:: _compute_base_action(action: numpy.ndarray) -> numpy.ndarray Compute the environment-specific action for each agent. :param action: Array indicating which agent (novice or expert) acts for each environment. :type action: numpy.ndarray :returns: Array of actions to be passed to the base environment. :rtype: numpy.ndarray .. rubric:: Examples >>> base_action = env._compute_base_action(action) .. py:method:: _get_obs() -> Dict[str, Any] Return the current observation for the coordination environment. :returns: A dictionary containing: - "base_obs": The current observation from the base environment. - "novice_hidden": Numpy array of hidden features from the novice policy (if open_novice). - "novice_logits": Numpy array of output logits from the novice policy (if open_novice). - "expert_hidden": Numpy array of hidden features from the expert policy (if open_expert). - "expert_logits": Numpy array of output logits from the expert policy (if open_expert). :rtype: dict .. rubric:: Examples >>> obs = env._get_obs() .. py:method:: _get_reward(base_reward: numpy.ndarray, action: numpy.ndarray, done: numpy.ndarray) -> numpy.ndarray Compute the reward for the current step, including costs for expert queries and agent switching. :param base_reward: The base reward from the environment. :type base_reward: numpy.ndarray :param action: The action(s) taken (novice or expert). :type action: numpy.ndarray :param done: Boolean flag(s) indicating whether the episode has ended for each environment. :type done: numpy.ndarray :returns: The computed reward(s) after applying costs. :rtype: numpy.ndarray .. rubric:: Examples >>> reward = env._get_reward(base_reward, action, done) .. py:method:: close() -> None Close the coordination environment and release any resources held. :rtype: None .. rubric:: Examples >>> env.close() .. py:class:: GeneralCoordEnv(config: CoordinationConfig, base_env: gymnasium.Env, novice: duo.core.Policy, expert: duo.core.Policy, open_novice: bool = True, open_expert: bool = False) Bases: :py:obj:`CoordEnv` Coordination environment supporting recurrent policies. This class supports policies that maintain a hidden state across steps, but can be less efficient for stateless policies than `CoordEnv`. .. rubric:: Examples >>> config = CoordinationConfig() >>> base_env = gym.make(...) >>> novice = ... >>> expert = ... >>> env = GeneralCoordEnv(config, base_env, novice, expert) .. py:method:: _compute_agents_action() -> numpy.ndarray Compute the actions for both novice and expert agents, supporting recurrent policies. :returns: Array of actions to be passed to the base environment. :rtype: numpy.ndarray .. rubric:: Examples >>> base_action = env._compute_agents_action() .. py:method:: _compute_base_action(action: numpy.ndarray) -> numpy.ndarray Compute the environment-specific action for each agent, supporting recurrent policies. :param action: Array indicating which agent (novice or expert) acts for each environment. :type action: numpy.ndarray :returns: Array of actions to be passed to the base environment. :rtype: numpy.ndarray .. rubric:: Examples >>> base_action = env._compute_base_action(action) .. py:method:: _get_obs() -> Dict[str, Any] Return the current observation for the coordination environment, supporting recurrent policies. :returns: A dictionary containing: - "base_obs": The current observation from the base environment. - "novice_hidden": Numpy array of hidden features from the novice policy (if open_novice). - "novice_logits": Numpy array of output logits from the novice policy (if open_novice). - "expert_hidden": Numpy array of hidden features from the expert policy (if open_expert). - "expert_logits": Numpy array of output logits from the expert policy (if open_expert). :rtype: dict .. rubric:: Examples >>> obs = env._get_obs() .. py:class:: Evaluator(config: EvaluatorConfig, env: gym.Env) Evaluator for running policy evaluation on environments and summarizing results. .. rubric:: Examples >>> evaluator = Evaluator(EvaluatorConfig(), env) >>> summary = evaluator.evaluate(policy) .. py:attribute:: config_cls .. py:attribute:: config .. py:attribute:: env .. py:method:: evaluate(policy: duo_ai.core.Policy, num_episodes: Optional[int] = None) -> Dict[str, Any] Evaluate a policy on the environment and summarize the results. :param policy: The policy to evaluate. Must implement an `act` method and have a `.model` attribute. :type policy: duo.core.Policy :param num_episodes: Number of episodes to run. If None, uses value from config. :type num_episodes: int, optional :returns: A dictionary mapping split names to summary statistics for each evaluation. :rtype: dict .. rubric:: Examples >>> summary = evaluator.evaluate(policy, num_episodes=100) >>> print(summary['reward_mean']) .. py:method:: _eval_one_iteration(policy: duo_ai.core.Policy, env: gym.Env) -> None Run a single evaluation iteration for the policy on the environment. :param policy: The policy to evaluate. :type policy: duo.core.Policy :param env: The environment instance to evaluate on. :type env: gym.Env :rtype: None .. py:function:: get_global_variable(key) Retrieve the value of a global variable by key. :param key: The key for the global variable. :type key: str :returns: The value of the global variable, or None if not set. :rtype: Any or None .. rubric:: Examples >>> get_global_variable('device') 'cuda' .. py:data:: __version__ .. py:function:: make_config(args: object, dotlist_args: object = None) -> core.config.MasterConfig Create and configure a MasterConfig object from command-line arguments. :param args: Arguments object with a 'config' attribute. :type args: object :param dotlist_args: Additional dotlist arguments for configuration. :type dotlist_args: object, optional :returns: Configured MasterConfig object. :rtype: MasterConfig .. rubric:: Examples >>> config = make_config(args) .. py:function:: make_algorithm(config: object) -> object Instantiate an algorithm from the registry using the provided config. :param config: Algorithm configuration object with a 'name' attribute. :type config: object :returns: Instantiated algorithm. :rtype: object .. rubric:: Examples >>> algo = make_algorithm(config) .. py:function:: make_policy(config: object, env: object) -> object Instantiate a policy from the registry using the provided config and environment. :param config: Policy configuration object with a 'name' attribute. :type config: object :param env: Environment instance. :type env: object :returns: Instantiated policy. :rtype: object .. rubric:: Examples >>> policy = make_policy(config, env) .. py:function:: load_policy(path: str, env: object) -> object Load a policy from a checkpoint file. :param path: Path to the checkpoint file. :type path: str :param env: Environment instance. :type env: object :returns: Loaded policy. :rtype: object .. rubric:: Examples >>> policy = load_policy("checkpoint.ckpt", env) .. py:function:: register_environment(name: str, config_cls: object) -> None Register an environment configuration class in the registry. :param name: Name of the environment. :type name: str :param config_cls: Environment configuration class. :type config_cls: object :rtype: None .. rubric:: Examples >>> register_environment("myenv", MyEnvConfig) .. py:function:: register_algorithm(name: str, algorithm_cls: object) -> None Register an algorithm class in the registry. :param name: Name of the algorithm. :type name: str :param algorithm_cls: Algorithm class. :type algorithm_cls: object :rtype: None .. rubric:: Examples >>> register_algorithm("ppo", PPOAlgorithm) .. py:function:: register_policy(name: str, policy_cls: object) -> None Register a policy class in the registry. :param name: Name of the policy. :type name: str :param policy_cls: Policy class. :type policy_cls: object :rtype: None .. rubric:: Examples >>> register_policy("ppo", PPOPolicy) .. py:function:: register_model(name: str, model_cls: object) -> None Register a model class in the registry. :param name: Name of the model. :type name: str :param model_cls: Model class. :type model_cls: object :rtype: None .. rubric:: Examples >>> register_model("mlp", MLPModel)