duo_ai
======

.. py:module:: duo_ai


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/duo_ai/algorithms/index
   /autoapi/duo_ai/core/index
   /autoapi/duo_ai/environments/index
   /autoapi/duo_ai/models/index
   /autoapi/duo_ai/policies/index
   /autoapi/duo_ai/utils/index


Attributes
----------

.. autoapisummary::

   duo_ai.__version__


Classes
-------

.. autoapisummary::

   duo_ai.MasterConfig
   duo_ai.CoordEnv
   duo_ai.GeneralCoordEnv
   duo_ai.Evaluator


Functions
---------

.. autoapisummary::

   duo_ai.configure
   duo_ai.get_global_variable
   duo_ai.make_config
   duo_ai.make_algorithm
   duo_ai.make_policy
   duo_ai.load_policy
   duo_ai.register_environment
   duo_ai.register_algorithm
   duo_ai.register_policy
   duo_ai.register_model


Package Contents
----------------

.. py:class:: MasterConfig

   Main configuration class for the Duo framework.

   This class holds all experiment-level configuration, including environment, policy, algorithm, evaluation, and coordination settings.

   :param name: Name of the experiment. Default is "default".
   :type name: str, optional
   :param device: Device index for CUDA. Default is 0.
   :type device: int, optional
   :param seed: Random seed for reproducibility. Default is 10.
   :type seed: int, optional
   :param env: Environment configuration or name. Default is "procgen".
   :type env: Any, optional
   :param policy: Policy configuration or name. Default is "PPOPolicy".
   :type policy: Any, optional
   :param algorithm: Algorithm configuration or name. Default is "PPOAlgorithm".
   :type algorithm: Any, optional
   :param evaluation: Evaluation configuration. Default is None.
   :type evaluation: Any, optional
   :param eval_name: Name for evaluation run. Default is None.
   :type eval_name: str, optional
   :param overwrite: Whether to overwrite existing experiment directory. Default is False.
   :type overwrite: bool, optional
   :param use_wandb: Whether to use Weights & Biases logging. Default is False.
   :type use_wandb: bool, optional
   :param experiment_dir: Path to the experiment directory. Default is "".
   :type experiment_dir: str, optional
   :param train_novice: Path to novice training checkpoint. Default is None.
   :type train_novice: str, optional
   :param train_expert: Path to expert training checkpoint. Default is None.
   :type train_expert: str, optional
   :param test_novice: Path to novice test checkpoint. Default is None.
   :type test_novice: str, optional
   :param test_expert: Path to expert test checkpoint. Default is None.
   :type test_expert: str, optional
   :param coordination: Coordination configuration. Default is None.
   :type coordination: Any, optional

   .. rubric:: Examples

   >>> config = MasterConfig(name="my_experiment", env="procgen", policy="PPOPolicy")


   .. py:attribute:: name
      :type:  str
      :value: 'default'


   .. py:attribute:: device
      :type:  int
      :value: 0


   .. py:attribute:: seed
      :type:  int
      :value: 10


   .. py:attribute:: env
      :type:  Any
      :value: 'procgen'


   .. py:attribute:: policy
      :type:  Any
      :value: 'PPOPolicy'


   .. py:attribute:: algorithm
      :type:  Any
      :value: 'PPOAlgorithm'


   .. py:attribute:: evaluation
      :type:  Any
      :value: None


   .. py:attribute:: eval_mode
      :type:  Optional[int]
      :value: None


   .. py:attribute:: eval_name
      :type:  Optional[str]
      :value: None


   .. py:attribute:: overwrite
      :type:  bool
      :value: False


   .. py:attribute:: use_wandb
      :type:  bool
      :value: False


   .. py:attribute:: experiment_dir
      :type:  str
      :value: ''


   .. py:attribute:: train_novice
      :type:  Optional[str]
      :value: None


   .. py:attribute:: train_expert
      :type:  Optional[str]
      :value: None


   .. py:attribute:: test_novice
      :type:  Optional[str]
      :value: None


   .. py:attribute:: test_expert
      :type:  Optional[str]
      :value: None


   .. py:attribute:: coordination
      :type:  Any
      :value: None


   .. py:method:: __post_init__() -> None

      Post-initialization logic for MasterConfig.

      Converts string or dictionary fields for env, policy, algorithm, evaluation, and coordination
      into their respective configuration objects.

      :raises IndexError: If required keys are missing in configuration dictionaries.
      :raises ValueError: If configuration fields are not of expected types.

      .. rubric:: Examples

      >>> config = MasterConfig(env={"name": "procgen"})
      >>> config.__post_init__()


.. py:function:: configure(config: MasterConfig) -> None

   Set up experiment directory, logging, random seeds, and global variables for the experiment.

   :param config: The experiment configuration object.
   :type config: MasterConfig

   :rtype: None

   :raises FileExistsError: If the experiment directory exists and overwrite is not set.

   .. rubric:: Examples

   >>> configure(config)


.. py:class:: CoordEnv(config: CoordinationConfig, base_env: gymnasium.Env, novice: duo.core.Policy, expert: duo.core.Policy, open_novice: bool = True, open_expert: bool = False)

   Bases: :py:obj:`gymnasium.Env`


   Environment for coordinating between novice and expert policies.

   This class wraps a base environment and enables switching between a novice and expert policy,
   applying costs for expert queries and agent switching.

   .. rubric:: Examples

   >>> config = CoordinationConfig()
   >>> base_env = gym.make(...)
   >>> novice = ...
   >>> expert = ...
   >>> env = CoordEnv(config, base_env, novice, expert)


   .. py:attribute:: config_cls


   .. py:attribute:: NOVICE
      :value: 0


   .. py:attribute:: EXPERT
      :value: 1


   .. py:attribute:: config


   .. py:attribute:: base_env


   .. py:attribute:: novice


   .. py:attribute:: expert


   .. py:attribute:: open_novice
      :value: True


   .. py:attribute:: open_expert
      :value: False


   .. py:attribute:: action_space


   .. py:attribute:: observation_space


   .. py:attribute:: expert_query_cost_per_action
      :value: None


   .. py:attribute:: switch_agent_cost_per_action
      :value: None


   .. py:property:: num_envs
      :type: int


      Number of parallel environments.

      :returns: Number of parallel environments.
      :rtype: int

      .. rubric:: Examples

      >>> n = env.num_envs


   .. py:method:: set_costs(base_penalty: float) -> None

      Set the cost per action for expert queries and agent switching.

      :param base_penalty: The reward value per action.
      :type base_penalty: float

      :rtype: None

      .. rubric:: Examples

      >>> env.set_costs(0.05)


   .. py:method:: reset() -> Dict[str, Any]

      Reset the coordination environment to an initial state.

      :returns:

                The initial observation of the environment, including:
                    - "base_obs": The initial observation from the base environment.
                    - "novice_hidden": Numpy array of hidden features from the novice policy.
                    - "novice_logits": Numpy array of output logits from the novice policy.
                    - "expert_hidden": Numpy array of hidden features from the expert policy (if open_expert).
                    - "expert_logits": Numpy array of output logits from the expert policy (if open_expert).
      :rtype: dict

      .. rubric:: Examples

      >>> obs = env.reset()


   .. py:method:: _reset_agents(done: numpy.ndarray) -> None

      Reset the internal state of the novice and expert agents.

      :param done: Boolean array indicating which episodes in a batch require a reset.
      :type done: numpy.ndarray

      :rtype: None

      .. rubric:: Examples

      >>> env._reset_agents(np.array([True, False]))


   .. py:method:: step(action: numpy.ndarray) -> Tuple[Dict[str, Any], numpy.ndarray, numpy.ndarray, List[Dict[str, Any]]]

      Advance the environment by one step using the provided action.

      :param action: The action(s) to take in the environment. Should be a numpy array indicating which agent acts.
      :type action: numpy.ndarray

      :returns: * **obs** (*dict*) --

                  The next observation of the environment, including:
                      - "base_obs": The observation from the base environment.
                      - "novice_hidden": Numpy array of hidden features from the novice policy.
                      - "novice_logits": Numpy array of output logits from the novice policy.
                      - "expert_hidden": Numpy array of hidden features from the expert policy (if open_expert).
                      - "expert_logits": Numpy array of output logits from the expert policy (if open_expert).
                * **reward** (*numpy.ndarray*) -- The reward(s) obtained from the environment after taking the action.
                * **done** (*numpy.ndarray*) -- Boolean flag(s) indicating whether the episode has ended for each environment.
                * **info** (*list of dict*) -- Additional information from the environment for each agent or environment instance.

      :raises Exception: Propagates any exceptions raised by the underlying environment's `step` method.

      .. rubric:: Examples

      >>> obs, reward, done, info = env.step(action)


   .. py:method:: _compute_base_action(action: numpy.ndarray) -> numpy.ndarray

      Compute the environment-specific action for each agent.

      :param action: Array indicating which agent (novice or expert) acts for each environment.
      :type action: numpy.ndarray

      :returns: Array of actions to be passed to the base environment.
      :rtype: numpy.ndarray

      .. rubric:: Examples

      >>> base_action = env._compute_base_action(action)


   .. py:method:: _get_obs() -> Dict[str, Any]

      Return the current observation for the coordination environment.

      :returns:

                A dictionary containing:
                    - "base_obs": The current observation from the base environment.
                    - "novice_hidden": Numpy array of hidden features from the novice policy (if open_novice).
                    - "novice_logits": Numpy array of output logits from the novice policy (if open_novice).
                    - "expert_hidden": Numpy array of hidden features from the expert policy (if open_expert).
                    - "expert_logits": Numpy array of output logits from the expert policy (if open_expert).
      :rtype: dict

      .. rubric:: Examples

      >>> obs = env._get_obs()


   .. py:method:: _get_reward(base_reward: numpy.ndarray, action: numpy.ndarray, done: numpy.ndarray) -> numpy.ndarray

      Compute the reward for the current step, including costs for expert queries and agent switching.

      :param base_reward: The base reward from the environment.
      :type base_reward: numpy.ndarray
      :param action: The action(s) taken (novice or expert).
      :type action: numpy.ndarray
      :param done: Boolean flag(s) indicating whether the episode has ended for each environment.
      :type done: numpy.ndarray

      :returns: The computed reward(s) after applying costs.
      :rtype: numpy.ndarray

      .. rubric:: Examples

      >>> reward = env._get_reward(base_reward, action, done)


   .. py:method:: close() -> None

      Close the coordination environment and release any resources held.

      :rtype: None

      .. rubric:: Examples

      >>> env.close()


.. py:class:: GeneralCoordEnv(config: CoordinationConfig, base_env: gymnasium.Env, novice: duo.core.Policy, expert: duo.core.Policy, open_novice: bool = True, open_expert: bool = False)

   Bases: :py:obj:`CoordEnv`


   Coordination environment supporting recurrent policies.

   This class supports policies that maintain a hidden state across steps, but can be less efficient for
   stateless policies than `CoordEnv`.

   .. rubric:: Examples

   >>> config = CoordinationConfig()
   >>> base_env = gym.make(...)
   >>> novice = ...
   >>> expert = ...
   >>> env = GeneralCoordEnv(config, base_env, novice, expert)


   .. py:method:: _compute_agents_action() -> numpy.ndarray

      Compute the actions for both novice and expert agents, supporting recurrent policies.

      :returns: Array of actions to be passed to the base environment.
      :rtype: numpy.ndarray

      .. rubric:: Examples

      >>> base_action = env._compute_agents_action()


   .. py:method:: _compute_base_action(action: numpy.ndarray) -> numpy.ndarray

      Compute the environment-specific action for each agent, supporting recurrent policies.

      :param action: Array indicating which agent (novice or expert) acts for each environment.
      :type action: numpy.ndarray

      :returns: Array of actions to be passed to the base environment.
      :rtype: numpy.ndarray

      .. rubric:: Examples

      >>> base_action = env._compute_base_action(action)


   .. py:method:: _get_obs() -> Dict[str, Any]

      Return the current observation for the coordination environment, supporting recurrent policies.

      :returns:

                A dictionary containing:
                    - "base_obs": The current observation from the base environment.
                    - "novice_hidden": Numpy array of hidden features from the novice policy (if open_novice).
                    - "novice_logits": Numpy array of output logits from the novice policy (if open_novice).
                    - "expert_hidden": Numpy array of hidden features from the expert policy (if open_expert).
                    - "expert_logits": Numpy array of output logits from the expert policy (if open_expert).
      :rtype: dict

      .. rubric:: Examples

      >>> obs = env._get_obs()


.. py:class:: Evaluator(config: EvaluatorConfig, env: gym.Env)

   Evaluator for running policy evaluation on environments and summarizing results.

   .. rubric:: Examples

   >>> evaluator = Evaluator(EvaluatorConfig(), env)
   >>> summary = evaluator.evaluate(policy)


   .. py:attribute:: config_cls


   .. py:attribute:: config


   .. py:attribute:: env


   .. py:method:: evaluate(policy: duo_ai.core.Policy, num_episodes: Optional[int] = None) -> Dict[str, Any]

      Evaluate a policy on the environment and summarize the results.

      :param policy: The policy to evaluate. Must implement an `act` method and have a `.model` attribute.
      :type policy: duo.core.Policy
      :param num_episodes: Number of episodes to run. If None, uses value from config.
      :type num_episodes: int, optional

      :returns: A dictionary mapping split names to summary statistics for each evaluation.
      :rtype: dict

      .. rubric:: Examples

      >>> summary = evaluator.evaluate(policy, num_episodes=100)
      >>> print(summary['reward_mean'])


   .. py:method:: _eval_one_iteration(policy: duo_ai.core.Policy, env: gym.Env) -> None

      Run a single evaluation iteration for the policy on the environment.

      :param policy: The policy to evaluate.
      :type policy: duo.core.Policy
      :param env: The environment instance to evaluate on.
      :type env: gym.Env

      :rtype: None


.. py:function:: get_global_variable(key)

   Retrieve the value of a global variable by key.

   :param key: The key for the global variable.
   :type key: str

   :returns: The value of the global variable, or None if not set.
   :rtype: Any or None

   .. rubric:: Examples

   >>> get_global_variable('device')
   'cuda'


.. py:data:: __version__

.. py:function:: make_config(args: object, dotlist_args: object = None) -> core.config.MasterConfig

   Create and configure a MasterConfig object from command-line arguments.

   :param args: Arguments object with a 'config' attribute.
   :type args: object
   :param dotlist_args: Additional dotlist arguments for configuration.
   :type dotlist_args: object, optional

   :returns: Configured MasterConfig object.
   :rtype: MasterConfig

   .. rubric:: Examples

   >>> config = make_config(args)


.. py:function:: make_algorithm(config: object) -> object

   Instantiate an algorithm from the registry using the provided config.

   :param config: Algorithm configuration object with a 'name' attribute.
   :type config: object

   :returns: Instantiated algorithm.
   :rtype: object

   .. rubric:: Examples

   >>> algo = make_algorithm(config)


.. py:function:: make_policy(config: object, env: object) -> object

   Instantiate a policy from the registry using the provided config and environment.

   :param config: Policy configuration object with a 'name' attribute.
   :type config: object
   :param env: Environment instance.
   :type env: object

   :returns: Instantiated policy.
   :rtype: object

   .. rubric:: Examples

   >>> policy = make_policy(config, env)


.. py:function:: load_policy(path: str, env: object) -> object

   Load a policy from a checkpoint file.

   :param path: Path to the checkpoint file.
   :type path: str
   :param env: Environment instance.
   :type env: object

   :returns: Loaded policy.
   :rtype: object

   .. rubric:: Examples

   >>> policy = load_policy("checkpoint.ckpt", env)


.. py:function:: register_environment(name: str, config_cls: object) -> None

   Register an environment configuration class in the registry.

   :param name: Name of the environment.
   :type name: str
   :param config_cls: Environment configuration class.
   :type config_cls: object

   :rtype: None

   .. rubric:: Examples

   >>> register_environment("myenv", MyEnvConfig)


.. py:function:: register_algorithm(name: str, algorithm_cls: object) -> None

   Register an algorithm class in the registry.

   :param name: Name of the algorithm.
   :type name: str
   :param algorithm_cls: Algorithm class.
   :type algorithm_cls: object

   :rtype: None

   .. rubric:: Examples

   >>> register_algorithm("ppo", PPOAlgorithm)


.. py:function:: register_policy(name: str, policy_cls: object) -> None

   Register a policy class in the registry.

   :param name: Name of the policy.
   :type name: str
   :param policy_cls: Policy class.
   :type policy_cls: object

   :rtype: None

   .. rubric:: Examples

   >>> register_policy("ppo", PPOPolicy)


.. py:function:: register_model(name: str, model_cls: object) -> None

   Register a model class in the registry.

   :param name: Name of the model.
   :type name: str
   :param model_cls: Model class.
   :type model_cls: object

   :rtype: None

   .. rubric:: Examples

   >>> register_model("mlp", MLPModel)