duo_ai

Submodules

Attributes

Classes

MasterConfig

Main configuration class for the Duo framework.

CoordEnv

Environment for coordinating between novice and expert policies.

GeneralCoordEnv

Coordination environment supporting recurrent policies.

Evaluator

Evaluator for running policy evaluation on environments and summarizing results.

Functions

configure(→ None)

Set up experiment directory, logging, random seeds, and global variables for the experiment.

get_global_variable(key)

Retrieve the value of a global variable by key.

make_config(→ core.config.MasterConfig)

Create and configure a MasterConfig object from command-line arguments.

make_algorithm(→ object)

Instantiate an algorithm from the registry using the provided config.

make_policy(→ object)

Instantiate a policy from the registry using the provided config and environment.

load_policy(→ object)

Load a policy from a checkpoint file.

register_environment(→ None)

Register an environment configuration class in the registry.

register_algorithm(→ None)

Register an algorithm class in the registry.

register_policy(→ None)

Register a policy class in the registry.

register_model(→ None)

Register a model class in the registry.

Package Contents

class duo_ai.MasterConfig[source]

Main configuration class for the Duo framework.

This class holds all experiment-level configuration, including environment, policy, algorithm, evaluation, and coordination settings.

Parameters:
  • name (str, optional) – Name of the experiment. Default is “default”.

  • device (int, optional) – Device index for CUDA. Default is 0.

  • seed (int, optional) – Random seed for reproducibility. Default is 10.

  • env (Any, optional) – Environment configuration or name. Default is “procgen”.

  • policy (Any, optional) – Policy configuration or name. Default is “PPOPolicy”.

  • algorithm (Any, optional) – Algorithm configuration or name. Default is “PPOAlgorithm”.

  • evaluation (Any, optional) – Evaluation configuration. Default is None.

  • eval_name (str, optional) – Name for evaluation run. Default is None.

  • overwrite (bool, optional) – Whether to overwrite existing experiment directory. Default is False.

  • use_wandb (bool, optional) – Whether to use Weights & Biases logging. Default is False.

  • experiment_dir (str, optional) – Path to the experiment directory. Default is “”.

  • train_novice (str, optional) – Path to novice training checkpoint. Default is None.

  • train_expert (str, optional) – Path to expert training checkpoint. Default is None.

  • test_novice (str, optional) – Path to novice test checkpoint. Default is None.

  • test_expert (str, optional) – Path to expert test checkpoint. Default is None.

  • coordination (Any, optional) – Coordination configuration. Default is None.

Examples

>>> config = MasterConfig(name="my_experiment", env="procgen", policy="PPOPolicy")
name: str = 'default'
device: int = 0
seed: int = 10
env: Any = 'procgen'
policy: Any = 'PPOPolicy'
algorithm: Any = 'PPOAlgorithm'
evaluation: Any = None
eval_mode: int | None = None
eval_name: str | None = None
overwrite: bool = False
use_wandb: bool = False
experiment_dir: str = ''
train_novice: str | None = None
train_expert: str | None = None
test_novice: str | None = None
test_expert: str | None = None
coordination: Any = None
__post_init__() None[source]

Post-initialization logic for MasterConfig.

Converts string or dictionary fields for env, policy, algorithm, evaluation, and coordination into their respective configuration objects.

Raises:
  • IndexError – If required keys are missing in configuration dictionaries.

  • ValueError – If configuration fields are not of expected types.

Examples

>>> config = MasterConfig(env={"name": "procgen"})
>>> config.__post_init__()
duo_ai.configure(config: MasterConfig) None[source]

Set up experiment directory, logging, random seeds, and global variables for the experiment.

Parameters:

config (MasterConfig) – The experiment configuration object.

Return type:

None

Raises:

FileExistsError – If the experiment directory exists and overwrite is not set.

Examples

>>> configure(config)
class duo_ai.CoordEnv(config: CoordinationConfig, base_env: gymnasium.Env, novice: duo.core.Policy, expert: duo.core.Policy, open_novice: bool = True, open_expert: bool = False)[source]

Bases: gymnasium.Env

Environment for coordinating between novice and expert policies.

This class wraps a base environment and enables switching between a novice and expert policy, applying costs for expert queries and agent switching.

Examples

>>> config = CoordinationConfig()
>>> base_env = gym.make(...)
>>> novice = ...
>>> expert = ...
>>> env = CoordEnv(config, base_env, novice, expert)
config_cls
NOVICE = 0
EXPERT = 1
config
base_env
novice
expert
open_novice = True
open_expert = False
action_space
observation_space
expert_query_cost_per_action = None
switch_agent_cost_per_action = None
property num_envs: int

Number of parallel environments.

Returns:

Number of parallel environments.

Return type:

int

Examples

>>> n = env.num_envs
set_costs(base_penalty: float) None[source]

Set the cost per action for expert queries and agent switching.

Parameters:

base_penalty (float) – The reward value per action.

Return type:

None

Examples

>>> env.set_costs(0.05)
reset() Dict[str, Any][source]

Reset the coordination environment to an initial state.

Returns:

The initial observation of the environment, including:
  • ”base_obs”: The initial observation from the base environment.

  • ”novice_hidden”: Numpy array of hidden features from the novice policy.

  • ”novice_logits”: Numpy array of output logits from the novice policy.

  • ”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).

  • ”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).

Return type:

dict

Examples

>>> obs = env.reset()
_reset_agents(done: numpy.ndarray) None[source]

Reset the internal state of the novice and expert agents.

Parameters:

done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.

Return type:

None

Examples

>>> env._reset_agents(np.array([True, False]))
step(action: numpy.ndarray) Tuple[Dict[str, Any], numpy.ndarray, numpy.ndarray, List[Dict[str, Any]]][source]

Advance the environment by one step using the provided action.

Parameters:

action (numpy.ndarray) – The action(s) to take in the environment. Should be a numpy array indicating which agent acts.

Returns:

  • obs (dict) –

    The next observation of the environment, including:
    • ”base_obs”: The observation from the base environment.

    • ”novice_hidden”: Numpy array of hidden features from the novice policy.

    • ”novice_logits”: Numpy array of output logits from the novice policy.

    • ”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).

    • ”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).

  • reward (numpy.ndarray) – The reward(s) obtained from the environment after taking the action.

  • done (numpy.ndarray) – Boolean flag(s) indicating whether the episode has ended for each environment.

  • info (list of dict) – Additional information from the environment for each agent or environment instance.

Raises:

Exception – Propagates any exceptions raised by the underlying environment’s step method.

Examples

>>> obs, reward, done, info = env.step(action)
_compute_base_action(action: numpy.ndarray) numpy.ndarray[source]

Compute the environment-specific action for each agent.

Parameters:

action (numpy.ndarray) – Array indicating which agent (novice or expert) acts for each environment.

Returns:

Array of actions to be passed to the base environment.

Return type:

numpy.ndarray

Examples

>>> base_action = env._compute_base_action(action)
_get_obs() Dict[str, Any][source]

Return the current observation for the coordination environment.

Returns:

A dictionary containing:
  • ”base_obs”: The current observation from the base environment.

  • ”novice_hidden”: Numpy array of hidden features from the novice policy (if open_novice).

  • ”novice_logits”: Numpy array of output logits from the novice policy (if open_novice).

  • ”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).

  • ”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).

Return type:

dict

Examples

>>> obs = env._get_obs()
_get_reward(base_reward: numpy.ndarray, action: numpy.ndarray, done: numpy.ndarray) numpy.ndarray[source]

Compute the reward for the current step, including costs for expert queries and agent switching.

Parameters:
  • base_reward (numpy.ndarray) – The base reward from the environment.

  • action (numpy.ndarray) – The action(s) taken (novice or expert).

  • done (numpy.ndarray) – Boolean flag(s) indicating whether the episode has ended for each environment.

Returns:

The computed reward(s) after applying costs.

Return type:

numpy.ndarray

Examples

>>> reward = env._get_reward(base_reward, action, done)
close() None[source]

Close the coordination environment and release any resources held.

Return type:

None

Examples

>>> env.close()
class duo_ai.GeneralCoordEnv(config: CoordinationConfig, base_env: gymnasium.Env, novice: duo.core.Policy, expert: duo.core.Policy, open_novice: bool = True, open_expert: bool = False)[source]

Bases: CoordEnv

Coordination environment supporting recurrent policies.

This class supports policies that maintain a hidden state across steps, but can be less efficient for stateless policies than CoordEnv.

Examples

>>> config = CoordinationConfig()
>>> base_env = gym.make(...)
>>> novice = ...
>>> expert = ...
>>> env = GeneralCoordEnv(config, base_env, novice, expert)
_compute_agents_action() numpy.ndarray[source]

Compute the actions for both novice and expert agents, supporting recurrent policies.

Returns:

Array of actions to be passed to the base environment.

Return type:

numpy.ndarray

Examples

>>> base_action = env._compute_agents_action()
_compute_base_action(action: numpy.ndarray) numpy.ndarray[source]

Compute the environment-specific action for each agent, supporting recurrent policies.

Parameters:

action (numpy.ndarray) – Array indicating which agent (novice or expert) acts for each environment.

Returns:

Array of actions to be passed to the base environment.

Return type:

numpy.ndarray

Examples

>>> base_action = env._compute_base_action(action)
_get_obs() Dict[str, Any][source]

Return the current observation for the coordination environment, supporting recurrent policies.

Returns:

A dictionary containing:
  • ”base_obs”: The current observation from the base environment.

  • ”novice_hidden”: Numpy array of hidden features from the novice policy (if open_novice).

  • ”novice_logits”: Numpy array of output logits from the novice policy (if open_novice).

  • ”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).

  • ”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).

Return type:

dict

Examples

>>> obs = env._get_obs()
class duo_ai.Evaluator(config: EvaluatorConfig, env: gym.Env)[source]

Evaluator for running policy evaluation on environments and summarizing results.

Examples

>>> evaluator = Evaluator(EvaluatorConfig(), env)
>>> summary = evaluator.evaluate(policy)
config_cls
config
env
evaluate(policy: duo_ai.core.Policy, num_episodes: int | None = None) Dict[str, Any][source]

Evaluate a policy on the environment and summarize the results.

Parameters:
  • policy (duo.core.Policy) – The policy to evaluate. Must implement an act method and have a .model attribute.

  • num_episodes (int, optional) – Number of episodes to run. If None, uses value from config.

Returns:

A dictionary mapping split names to summary statistics for each evaluation.

Return type:

dict

Examples

>>> summary = evaluator.evaluate(policy, num_episodes=100)
>>> print(summary['reward_mean'])
_eval_one_iteration(policy: duo_ai.core.Policy, env: gym.Env) None[source]

Run a single evaluation iteration for the policy on the environment.

Parameters:
  • policy (duo.core.Policy) – The policy to evaluate.

  • env (gym.Env) – The environment instance to evaluate on.

Return type:

None

duo_ai.get_global_variable(key)[source]

Retrieve the value of a global variable by key.

Parameters:

key (str) – The key for the global variable.

Returns:

The value of the global variable, or None if not set.

Return type:

Any or None

Examples

>>> get_global_variable('device')
'cuda'
duo_ai.__version__
duo_ai.make_config(args: object, dotlist_args: object = None) core.config.MasterConfig[source]

Create and configure a MasterConfig object from command-line arguments.

Parameters:
  • args (object) – Arguments object with a ‘config’ attribute.

  • dotlist_args (object, optional) – Additional dotlist arguments for configuration.

Returns:

Configured MasterConfig object.

Return type:

MasterConfig

Examples

>>> config = make_config(args)
duo_ai.make_algorithm(config: object) object[source]

Instantiate an algorithm from the registry using the provided config.

Parameters:

config (object) – Algorithm configuration object with a ‘name’ attribute.

Returns:

Instantiated algorithm.

Return type:

object

Examples

>>> algo = make_algorithm(config)
duo_ai.make_policy(config: object, env: object) object[source]

Instantiate a policy from the registry using the provided config and environment.

Parameters:
  • config (object) – Policy configuration object with a ‘name’ attribute.

  • env (object) – Environment instance.

Returns:

Instantiated policy.

Return type:

object

Examples

>>> policy = make_policy(config, env)
duo_ai.load_policy(path: str, env: object) object[source]

Load a policy from a checkpoint file.

Parameters:
  • path (str) – Path to the checkpoint file.

  • env (object) – Environment instance.

Returns:

Loaded policy.

Return type:

object

Examples

>>> policy = load_policy("checkpoint.ckpt", env)
duo_ai.register_environment(name: str, config_cls: object) None[source]

Register an environment configuration class in the registry.

Parameters:
  • name (str) – Name of the environment.

  • config_cls (object) – Environment configuration class.

Return type:

None

Examples

>>> register_environment("myenv", MyEnvConfig)
duo_ai.register_algorithm(name: str, algorithm_cls: object) None[source]

Register an algorithm class in the registry.

Parameters:
  • name (str) – Name of the algorithm.

  • algorithm_cls (object) – Algorithm class.

Return type:

None

Examples

>>> register_algorithm("ppo", PPOAlgorithm)
duo_ai.register_policy(name: str, policy_cls: object) None[source]

Register a policy class in the registry.

Parameters:
  • name (str) – Name of the policy.

  • policy_cls (object) – Policy class.

Return type:

None

Examples

>>> register_policy("ppo", PPOPolicy)
duo_ai.register_model(name: str, model_cls: object) None[source]

Register a model class in the registry.

Parameters:
  • name (str) – Name of the model.

  • model_cls (object) – Model class.

Return type:

None

Examples

>>> register_model("mlp", MLPModel)