duo_ai¶

Submodules¶

Attributes¶

__version__

Classes¶

`MasterConfig`	Main configuration class for the Duo framework.
`CoordEnv`	Environment for coordinating between novice and expert policies.
`GeneralCoordEnv`	Coordination environment supporting recurrent policies.
`Evaluator`	Evaluator for running policy evaluation on environments and summarizing results.

Functions¶

`configure`(→ None)	Set up experiment directory, logging, random seeds, and global variables for the experiment.
`get_global_variable`(key)	Retrieve the value of a global variable by key.
`make_config`(→ core.config.MasterConfig)	Create and configure a MasterConfig object from command-line arguments.
`make_algorithm`(→ object)	Instantiate an algorithm from the registry using the provided config.
`make_policy`(→ object)	Instantiate a policy from the registry using the provided config and environment.
`load_policy`(→ object)	Load a policy from a checkpoint file.
`register_environment`(→ None)	Register an environment configuration class in the registry.
`register_algorithm`(→ None)	Register an algorithm class in the registry.
`register_policy`(→ None)	Register a policy class in the registry.
`register_model`(→ None)	Register a model class in the registry.

Package Contents¶

class duo_ai.MasterConfig[source]¶

Main configuration class for the Duo framework.

This class holds all experiment-level configuration, including environment, policy, algorithm, evaluation, and coordination settings.

Parameters:

name (str, optional) – Name of the experiment. Default is “default”.
device (int, optional) – Device index for CUDA. Default is 0.
seed (int, optional) – Random seed for reproducibility. Default is 10.
env (Any, optional) – Environment configuration or name. Default is “procgen”.
policy (Any, optional) – Policy configuration or name. Default is “PPOPolicy”.
algorithm (Any, optional) – Algorithm configuration or name. Default is “PPOAlgorithm”.
evaluation (Any, optional) – Evaluation configuration. Default is None.
eval_name (str, optional) – Name for evaluation run. Default is None.
overwrite (bool, optional) – Whether to overwrite existing experiment directory. Default is False.
use_wandb (bool, optional) – Whether to use Weights & Biases logging. Default is False.
experiment_dir (str, optional) – Path to the experiment directory. Default is “”.
train_novice (str, optional) – Path to novice training checkpoint. Default is None.
train_expert (str, optional) – Path to expert training checkpoint. Default is None.
test_novice (str, optional) – Path to novice test checkpoint. Default is None.
test_expert (str, optional) – Path to expert test checkpoint. Default is None.
coordination (Any, optional) – Coordination configuration. Default is None.

Examples

>>> config = MasterConfig(name="my_experiment", env="procgen", policy="PPOPolicy")

name: str = 'default'¶

device: int = 0¶

seed: int = 10¶

env: Any = 'procgen'¶

policy: Any = 'PPOPolicy'¶

algorithm: Any = 'PPOAlgorithm'¶

evaluation: Any = None¶

eval_mode: int | None = None¶

eval_name: str | None = None¶

overwrite: bool = False¶

use_wandb: bool = False¶

experiment_dir: str = ''¶

train_novice: str | None = None¶

train_expert: str | None = None¶

test_novice: str | None = None¶

test_expert: str | None = None¶

coordination: Any = None¶

__post_init__() → None[source]¶

Post-initialization logic for MasterConfig.

Converts string or dictionary fields for env, policy, algorithm, evaluation, and coordination into their respective configuration objects.

Raises:

IndexError – If required keys are missing in configuration dictionaries.
ValueError – If configuration fields are not of expected types.

Examples

>>> config = MasterConfig(env={"name": "procgen"})
>>> config.__post_init__()

duo_ai.configure(config: MasterConfig) → None[source]¶

Set up experiment directory, logging, random seeds, and global variables for the experiment.

Parameters:: config (MasterConfig) – The experiment configuration object.
Return type:: None
Raises:: FileExistsError – If the experiment directory exists and overwrite is not set.

Examples

>>> configure(config)

class duo_ai.CoordEnv(config: CoordinationConfig, base_env: gymnasium.Env, novice: duo.core.Policy, expert: duo.core.Policy, open_novice: bool = True, open_expert: bool = False)[source]¶

Bases: gymnasium.Env

Environment for coordinating between novice and expert policies.

This class wraps a base environment and enables switching between a novice and expert policy, applying costs for expert queries and agent switching.

Examples

>>> config = CoordinationConfig()
>>> base_env = gym.make(...)
>>> novice = ...
>>> expert = ...
>>> env = CoordEnv(config, base_env, novice, expert)

config_cls¶

NOVICE = 0¶

EXPERT = 1¶

config¶

base_env¶

novice¶

expert¶

open_novice = True¶

open_expert = False¶

action_space¶

observation_space¶

expert_query_cost_per_action = None¶

switch_agent_cost_per_action = None¶

property num_envs: int¶

Number of parallel environments.

Returns:: Number of parallel environments.
Return type:: int

Examples

>>> n = env.num_envs

set_costs(base_penalty: float) → None[source]¶

Set the cost per action for expert queries and agent switching.

Parameters:: base_penalty (float) – The reward value per action.
Return type:: None

Examples

>>> env.set_costs(0.05)

reset() → Dict[str, Any][source]¶

Reset the coordination environment to an initial state.

Returns:

The initial observation of the environment, including:

”base_obs”: The initial observation from the base environment.
”novice_hidden”: Numpy array of hidden features from the novice policy.
”novice_logits”: Numpy array of output logits from the novice policy.
”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).
”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).

Return type:

dict

Examples

>>> obs = env.reset()

_reset_agents(done: numpy.ndarray) → None[source]¶

Reset the internal state of the novice and expert agents.

Parameters:: done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.
Return type:: None

Examples

>>> env._reset_agents(np.array([True, False]))

step(action: numpy.ndarray) → Tuple[Dict[str, Any], numpy.ndarray, numpy.ndarray, List[Dict[str, Any]]][source]¶

Advance the environment by one step using the provided action.

Parameters:

action (numpy.ndarray) – The action(s) to take in the environment. Should be a numpy array indicating which agent acts.

Returns:

obs (dict) –
The next observation of the environment, including:
- ”base_obs”: The observation from the base environment.
- ”novice_hidden”: Numpy array of hidden features from the novice policy.
- ”novice_logits”: Numpy array of output logits from the novice policy.
- ”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).
- ”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).
reward (numpy.ndarray) – The reward(s) obtained from the environment after taking the action.
done (numpy.ndarray) – Boolean flag(s) indicating whether the episode has ended for each environment.
info (list of dict) – Additional information from the environment for each agent or environment instance.

Raises:

Exception – Propagates any exceptions raised by the underlying environment’s step method.

Examples

>>> obs, reward, done, info = env.step(action)

_compute_base_action(action: numpy.ndarray) → numpy.ndarray[source]¶

Compute the environment-specific action for each agent.

Parameters:: action (numpy.ndarray) – Array indicating which agent (novice or expert) acts for each environment.
Returns:: Array of actions to be passed to the base environment.
Return type:: numpy.ndarray

Examples

>>> base_action = env._compute_base_action(action)

_get_obs() → Dict[str, Any][source]¶

Return the current observation for the coordination environment.

Returns:

A dictionary containing:

”base_obs”: The current observation from the base environment.
”novice_hidden”: Numpy array of hidden features from the novice policy (if open_novice).
”novice_logits”: Numpy array of output logits from the novice policy (if open_novice).
”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).
”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).

Return type:

dict

Examples

>>> obs = env._get_obs()

_get_reward(base_reward: numpy.ndarray, action: numpy.ndarray, done: numpy.ndarray) → numpy.ndarray[source]¶

Compute the reward for the current step, including costs for expert queries and agent switching.

Parameters:

base_reward (numpy.ndarray) – The base reward from the environment.
action (numpy.ndarray) – The action(s) taken (novice or expert).
done (numpy.ndarray) – Boolean flag(s) indicating whether the episode has ended for each environment.

Returns:

The computed reward(s) after applying costs.

Return type:

numpy.ndarray

Examples

>>> reward = env._get_reward(base_reward, action, done)

close() → None[source]¶

Close the coordination environment and release any resources held.

Return type:: None

Examples

>>> env.close()

class duo_ai.GeneralCoordEnv(config: CoordinationConfig, base_env: gymnasium.Env, novice: duo.core.Policy, expert: duo.core.Policy, open_novice: bool = True, open_expert: bool = False)[source]¶

Bases: CoordEnv

Coordination environment supporting recurrent policies.

This class supports policies that maintain a hidden state across steps, but can be less efficient for stateless policies than CoordEnv.

Examples

>>> config = CoordinationConfig()
>>> base_env = gym.make(...)
>>> novice = ...
>>> expert = ...
>>> env = GeneralCoordEnv(config, base_env, novice, expert)

_compute_agents_action() → numpy.ndarray[source]¶

Compute the actions for both novice and expert agents, supporting recurrent policies.

Returns:: Array of actions to be passed to the base environment.
Return type:: numpy.ndarray

Examples

>>> base_action = env._compute_agents_action()

_compute_base_action(action: numpy.ndarray) → numpy.ndarray[source]¶

Compute the environment-specific action for each agent, supporting recurrent policies.

Parameters:: action (numpy.ndarray) – Array indicating which agent (novice or expert) acts for each environment.
Returns:: Array of actions to be passed to the base environment.
Return type:: numpy.ndarray

Examples

>>> base_action = env._compute_base_action(action)

_get_obs() → Dict[str, Any][source]¶

Return the current observation for the coordination environment, supporting recurrent policies.

Returns:

A dictionary containing:

”base_obs”: The current observation from the base environment.
”novice_hidden”: Numpy array of hidden features from the novice policy (if open_novice).
”novice_logits”: Numpy array of output logits from the novice policy (if open_novice).
”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).
”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).

Return type:

dict

Examples

>>> obs = env._get_obs()

class duo_ai.Evaluator(config: EvaluatorConfig, env: gym.Env)[source]¶

Evaluator for running policy evaluation on environments and summarizing results.

Examples

>>> evaluator = Evaluator(EvaluatorConfig(), env)
>>> summary = evaluator.evaluate(policy)

config_cls¶

config¶

env¶

evaluate(policy: duo_ai.core.Policy, num_episodes: int | None = None) → Dict[str, Any][source]¶

Evaluate a policy on the environment and summarize the results.

Parameters:

policy (duo.core.Policy) – The policy to evaluate. Must implement an act method and have a .model attribute.
num_episodes (int, optional) – Number of episodes to run. If None, uses value from config.

Returns:

A dictionary mapping split names to summary statistics for each evaluation.

Return type:

dict

Examples

>>> summary = evaluator.evaluate(policy, num_episodes=100)
>>> print(summary['reward_mean'])

_eval_one_iteration(policy: duo_ai.core.Policy, env: gym.Env) → None[source]¶

Run a single evaluation iteration for the policy on the environment.

Parameters:

policy (duo.core.Policy) – The policy to evaluate.
env (gym.Env) – The environment instance to evaluate on.

Return type:

None

duo_ai.get_global_variable(key)[source]¶

Retrieve the value of a global variable by key.

Parameters:: key (str) – The key for the global variable.
Returns:: The value of the global variable, or None if not set.
Return type:: Any or None

Examples

>>> get_global_variable('device')
'cuda'

duo_ai.__version__¶

duo_ai.make_config(args: object, dotlist_args: object = None) → core.config.MasterConfig[source]¶

Create and configure a MasterConfig object from command-line arguments.

Parameters:

args (object) – Arguments object with a ‘config’ attribute.
dotlist_args (object, optional) – Additional dotlist arguments for configuration.

Returns:

Configured MasterConfig object.

Return type:

MasterConfig

Examples

>>> config = make_config(args)

duo_ai.make_algorithm(config: object) → object[source]¶

Instantiate an algorithm from the registry using the provided config.

Parameters:: config (object) – Algorithm configuration object with a ‘name’ attribute.
Returns:: Instantiated algorithm.
Return type:: object

Examples

>>> algo = make_algorithm(config)

duo_ai.make_policy(config: object, env: object) → object[source]¶

Instantiate a policy from the registry using the provided config and environment.

Parameters:

config (object) – Policy configuration object with a ‘name’ attribute.
env (object) – Environment instance.

Returns:

Instantiated policy.

Return type:

object

Examples

>>> policy = make_policy(config, env)

duo_ai.load_policy(path: str, env: object) → object[source]¶

Load a policy from a checkpoint file.

Parameters:

path (str) – Path to the checkpoint file.
env (object) – Environment instance.

Returns:

Loaded policy.

Return type:

object

Examples

>>> policy = load_policy("checkpoint.ckpt", env)

duo_ai.register_environment(name: str, config_cls: object) → None[source]¶

Parameters:

name (str) – Name of the environment.
config_cls (object) – Environment configuration class.

Return type:

None

Examples

>>> register_environment("myenv", MyEnvConfig)

duo_ai.register_algorithm(name: str, algorithm_cls: object) → None[source]¶

Parameters:

name (str) – Name of the algorithm.
algorithm_cls (object) – Algorithm class.

Return type:

None

Examples

>>> register_algorithm("ppo", PPOAlgorithm)

duo_ai.register_policy(name: str, policy_cls: object) → None[source]¶

Parameters:

name (str) – Name of the policy.
policy_cls (object) – Policy class.

Return type:

None

Examples

>>> register_policy("ppo", PPOPolicy)

duo_ai.register_model(name: str, model_cls: object) → None[source]¶

Parameters:

name (str) – Name of the model.
model_cls (object) – Model class.

Return type:

None

Examples

>>> register_model("mlp", MLPModel)