duo_ai¶
Submodules¶
Attributes¶
Classes¶
Main configuration class for the Duo framework. |
|
Environment for coordinating between novice and expert policies. |
|
Coordination environment supporting recurrent policies. |
|
Evaluator for running policy evaluation on environments and summarizing results. |
Functions¶
|
Set up experiment directory, logging, random seeds, and global variables for the experiment. |
|
Retrieve the value of a global variable by key. |
|
Create and configure a MasterConfig object from command-line arguments. |
|
Instantiate an algorithm from the registry using the provided config. |
|
Instantiate a policy from the registry using the provided config and environment. |
|
Load a policy from a checkpoint file. |
|
Register an environment configuration class in the registry. |
|
Register an algorithm class in the registry. |
|
Register a policy class in the registry. |
|
Register a model class in the registry. |
Package Contents¶
- class duo_ai.MasterConfig[source]¶
Main configuration class for the Duo framework.
This class holds all experiment-level configuration, including environment, policy, algorithm, evaluation, and coordination settings.
- Parameters:
name (str, optional) – Name of the experiment. Default is “default”.
device (int, optional) – Device index for CUDA. Default is 0.
seed (int, optional) – Random seed for reproducibility. Default is 10.
env (Any, optional) – Environment configuration or name. Default is “procgen”.
policy (Any, optional) – Policy configuration or name. Default is “PPOPolicy”.
algorithm (Any, optional) – Algorithm configuration or name. Default is “PPOAlgorithm”.
evaluation (Any, optional) – Evaluation configuration. Default is None.
eval_name (str, optional) – Name for evaluation run. Default is None.
overwrite (bool, optional) – Whether to overwrite existing experiment directory. Default is False.
use_wandb (bool, optional) – Whether to use Weights & Biases logging. Default is False.
experiment_dir (str, optional) – Path to the experiment directory. Default is “”.
train_novice (str, optional) – Path to novice training checkpoint. Default is None.
train_expert (str, optional) – Path to expert training checkpoint. Default is None.
test_novice (str, optional) – Path to novice test checkpoint. Default is None.
test_expert (str, optional) – Path to expert test checkpoint. Default is None.
coordination (Any, optional) – Coordination configuration. Default is None.
Examples
>>> config = MasterConfig(name="my_experiment", env="procgen", policy="PPOPolicy")
- name: str = 'default'¶
- device: int = 0¶
- seed: int = 10¶
- env: Any = 'procgen'¶
- policy: Any = 'PPOPolicy'¶
- algorithm: Any = 'PPOAlgorithm'¶
- evaluation: Any = None¶
- eval_mode: int | None = None¶
- eval_name: str | None = None¶
- overwrite: bool = False¶
- use_wandb: bool = False¶
- experiment_dir: str = ''¶
- train_novice: str | None = None¶
- train_expert: str | None = None¶
- test_novice: str | None = None¶
- test_expert: str | None = None¶
- coordination: Any = None¶
- __post_init__() None[source]¶
Post-initialization logic for MasterConfig.
Converts string or dictionary fields for env, policy, algorithm, evaluation, and coordination into their respective configuration objects.
- Raises:
IndexError – If required keys are missing in configuration dictionaries.
ValueError – If configuration fields are not of expected types.
Examples
>>> config = MasterConfig(env={"name": "procgen"}) >>> config.__post_init__()
- duo_ai.configure(config: MasterConfig) None[source]¶
Set up experiment directory, logging, random seeds, and global variables for the experiment.
- Parameters:
config (MasterConfig) – The experiment configuration object.
- Return type:
None
- Raises:
FileExistsError – If the experiment directory exists and overwrite is not set.
Examples
>>> configure(config)
- class duo_ai.CoordEnv(config: CoordinationConfig, base_env: gymnasium.Env, novice: duo.core.Policy, expert: duo.core.Policy, open_novice: bool = True, open_expert: bool = False)[source]¶
Bases:
gymnasium.EnvEnvironment for coordinating between novice and expert policies.
This class wraps a base environment and enables switching between a novice and expert policy, applying costs for expert queries and agent switching.
Examples
>>> config = CoordinationConfig() >>> base_env = gym.make(...) >>> novice = ... >>> expert = ... >>> env = CoordEnv(config, base_env, novice, expert)
- config_cls¶
- NOVICE = 0¶
- EXPERT = 1¶
- config¶
- base_env¶
- novice¶
- expert¶
- open_novice = True¶
- open_expert = False¶
- action_space¶
- observation_space¶
- expert_query_cost_per_action = None¶
- switch_agent_cost_per_action = None¶
- property num_envs: int¶
Number of parallel environments.
- Returns:
Number of parallel environments.
- Return type:
int
Examples
>>> n = env.num_envs
- set_costs(base_penalty: float) None[source]¶
Set the cost per action for expert queries and agent switching.
- Parameters:
base_penalty (float) – The reward value per action.
- Return type:
None
Examples
>>> env.set_costs(0.05)
- reset() Dict[str, Any][source]¶
Reset the coordination environment to an initial state.
- Returns:
- The initial observation of the environment, including:
”base_obs”: The initial observation from the base environment.
”novice_hidden”: Numpy array of hidden features from the novice policy.
”novice_logits”: Numpy array of output logits from the novice policy.
”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).
”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).
- Return type:
dict
Examples
>>> obs = env.reset()
- _reset_agents(done: numpy.ndarray) None[source]¶
Reset the internal state of the novice and expert agents.
- Parameters:
done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.
- Return type:
None
Examples
>>> env._reset_agents(np.array([True, False]))
- step(action: numpy.ndarray) Tuple[Dict[str, Any], numpy.ndarray, numpy.ndarray, List[Dict[str, Any]]][source]¶
Advance the environment by one step using the provided action.
- Parameters:
action (numpy.ndarray) – The action(s) to take in the environment. Should be a numpy array indicating which agent acts.
- Returns:
obs (dict) –
- The next observation of the environment, including:
”base_obs”: The observation from the base environment.
”novice_hidden”: Numpy array of hidden features from the novice policy.
”novice_logits”: Numpy array of output logits from the novice policy.
”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).
”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).
reward (numpy.ndarray) – The reward(s) obtained from the environment after taking the action.
done (numpy.ndarray) – Boolean flag(s) indicating whether the episode has ended for each environment.
info (list of dict) – Additional information from the environment for each agent or environment instance.
- Raises:
Exception – Propagates any exceptions raised by the underlying environment’s step method.
Examples
>>> obs, reward, done, info = env.step(action)
- _compute_base_action(action: numpy.ndarray) numpy.ndarray[source]¶
Compute the environment-specific action for each agent.
- Parameters:
action (numpy.ndarray) – Array indicating which agent (novice or expert) acts for each environment.
- Returns:
Array of actions to be passed to the base environment.
- Return type:
numpy.ndarray
Examples
>>> base_action = env._compute_base_action(action)
- _get_obs() Dict[str, Any][source]¶
Return the current observation for the coordination environment.
- Returns:
- A dictionary containing:
”base_obs”: The current observation from the base environment.
”novice_hidden”: Numpy array of hidden features from the novice policy (if open_novice).
”novice_logits”: Numpy array of output logits from the novice policy (if open_novice).
”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).
”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).
- Return type:
dict
Examples
>>> obs = env._get_obs()
- _get_reward(base_reward: numpy.ndarray, action: numpy.ndarray, done: numpy.ndarray) numpy.ndarray[source]¶
Compute the reward for the current step, including costs for expert queries and agent switching.
- Parameters:
base_reward (numpy.ndarray) – The base reward from the environment.
action (numpy.ndarray) – The action(s) taken (novice or expert).
done (numpy.ndarray) – Boolean flag(s) indicating whether the episode has ended for each environment.
- Returns:
The computed reward(s) after applying costs.
- Return type:
numpy.ndarray
Examples
>>> reward = env._get_reward(base_reward, action, done)
- class duo_ai.GeneralCoordEnv(config: CoordinationConfig, base_env: gymnasium.Env, novice: duo.core.Policy, expert: duo.core.Policy, open_novice: bool = True, open_expert: bool = False)[source]¶
Bases:
CoordEnvCoordination environment supporting recurrent policies.
This class supports policies that maintain a hidden state across steps, but can be less efficient for stateless policies than CoordEnv.
Examples
>>> config = CoordinationConfig() >>> base_env = gym.make(...) >>> novice = ... >>> expert = ... >>> env = GeneralCoordEnv(config, base_env, novice, expert)
- _compute_agents_action() numpy.ndarray[source]¶
Compute the actions for both novice and expert agents, supporting recurrent policies.
- Returns:
Array of actions to be passed to the base environment.
- Return type:
numpy.ndarray
Examples
>>> base_action = env._compute_agents_action()
- _compute_base_action(action: numpy.ndarray) numpy.ndarray[source]¶
Compute the environment-specific action for each agent, supporting recurrent policies.
- Parameters:
action (numpy.ndarray) – Array indicating which agent (novice or expert) acts for each environment.
- Returns:
Array of actions to be passed to the base environment.
- Return type:
numpy.ndarray
Examples
>>> base_action = env._compute_base_action(action)
- _get_obs() Dict[str, Any][source]¶
Return the current observation for the coordination environment, supporting recurrent policies.
- Returns:
- A dictionary containing:
”base_obs”: The current observation from the base environment.
”novice_hidden”: Numpy array of hidden features from the novice policy (if open_novice).
”novice_logits”: Numpy array of output logits from the novice policy (if open_novice).
”expert_hidden”: Numpy array of hidden features from the expert policy (if open_expert).
”expert_logits”: Numpy array of output logits from the expert policy (if open_expert).
- Return type:
dict
Examples
>>> obs = env._get_obs()
- class duo_ai.Evaluator(config: EvaluatorConfig, env: gym.Env)[source]¶
Evaluator for running policy evaluation on environments and summarizing results.
Examples
>>> evaluator = Evaluator(EvaluatorConfig(), env) >>> summary = evaluator.evaluate(policy)
- config_cls¶
- config¶
- env¶
- evaluate(policy: duo_ai.core.Policy, num_episodes: int | None = None) Dict[str, Any][source]¶
Evaluate a policy on the environment and summarize the results.
- Parameters:
policy (duo.core.Policy) – The policy to evaluate. Must implement an act method and have a .model attribute.
num_episodes (int, optional) – Number of episodes to run. If None, uses value from config.
- Returns:
A dictionary mapping split names to summary statistics for each evaluation.
- Return type:
dict
Examples
>>> summary = evaluator.evaluate(policy, num_episodes=100) >>> print(summary['reward_mean'])
- _eval_one_iteration(policy: duo_ai.core.Policy, env: gym.Env) None[source]¶
Run a single evaluation iteration for the policy on the environment.
- Parameters:
policy (duo.core.Policy) – The policy to evaluate.
env (gym.Env) – The environment instance to evaluate on.
- Return type:
None
- duo_ai.get_global_variable(key)[source]¶
Retrieve the value of a global variable by key.
- Parameters:
key (str) – The key for the global variable.
- Returns:
The value of the global variable, or None if not set.
- Return type:
Any or None
Examples
>>> get_global_variable('device') 'cuda'
- duo_ai.__version__¶
- duo_ai.make_config(args: object, dotlist_args: object = None) core.config.MasterConfig[source]¶
Create and configure a MasterConfig object from command-line arguments.
- Parameters:
args (object) – Arguments object with a ‘config’ attribute.
dotlist_args (object, optional) – Additional dotlist arguments for configuration.
- Returns:
Configured MasterConfig object.
- Return type:
Examples
>>> config = make_config(args)
- duo_ai.make_algorithm(config: object) object[source]¶
Instantiate an algorithm from the registry using the provided config.
- Parameters:
config (object) – Algorithm configuration object with a ‘name’ attribute.
- Returns:
Instantiated algorithm.
- Return type:
object
Examples
>>> algo = make_algorithm(config)
- duo_ai.make_policy(config: object, env: object) object[source]¶
Instantiate a policy from the registry using the provided config and environment.
- Parameters:
config (object) – Policy configuration object with a ‘name’ attribute.
env (object) – Environment instance.
- Returns:
Instantiated policy.
- Return type:
object
Examples
>>> policy = make_policy(config, env)
- duo_ai.load_policy(path: str, env: object) object[source]¶
Load a policy from a checkpoint file.
- Parameters:
path (str) – Path to the checkpoint file.
env (object) – Environment instance.
- Returns:
Loaded policy.
- Return type:
object
Examples
>>> policy = load_policy("checkpoint.ckpt", env)
- duo_ai.register_environment(name: str, config_cls: object) None[source]¶
Register an environment configuration class in the registry.
- Parameters:
name (str) – Name of the environment.
config_cls (object) – Environment configuration class.
- Return type:
None
Examples
>>> register_environment("myenv", MyEnvConfig)
- duo_ai.register_algorithm(name: str, algorithm_cls: object) None[source]¶
Register an algorithm class in the registry.
- Parameters:
name (str) – Name of the algorithm.
algorithm_cls (object) – Algorithm class.
- Return type:
None
Examples
>>> register_algorithm("ppo", PPOAlgorithm)