duo_ai.policies

Submodules

Attributes

Classes

AlwaysPolicy

Policy that always selects the same agent (novice or expert) for every action.

LogitPolicy

Policy that selects actions based on logit confidence metrics and thresholds.

PPOPolicy

Policy class for PPO, wrapping a model and providing action selection and parameter management.

PyODPolicy

Policy that uses a PyOD outlier detector for action selection based on OOD scores.

RandomPolicy

Policy that selects the expert action with a fixed probability.

Package Contents

class duo_ai.policies.AlwaysPolicy(config: AlwaysPolicyConfig, env: gym.Env)[source]

Bases: duo_ai.core.policy.Policy

Policy that always selects the same agent (novice or expert) for every action.

Examples

>>> policy = AlwaysPolicy(AlwaysPolicyConfig(agent="novice"), env)
>>> obs = ...
>>> action = policy.act(obs)
config_cls
choice
device
config
act(obs: Any, temperature: float | None = None) torch.Tensor[source]

Select the constant action for a batch of observations.

Parameters:
  • obs (dict or np.ndarray) – Batch of observations. If dict, must contain ‘base_obs’.

  • temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of constant actions (agent indices) for the batch.

Return type:

torch.Tensor

Raises:

ValueError – If obs is not a dict or numpy array.

Examples

>>> action = policy.act(obs)
reset(done: numpy.ndarray) None[source]

Reset the policy state at episode boundaries.

Parameters:

done (np.ndarray) – Boolean array indicating which episodes in a batch require a reset.

Return type:

None

Examples

>>> policy.reset(done)
get_params() Dict[str, Any][source]

Get the current parameters of the policy.

Returns:

Dictionary of policy parameters.

Return type:

dict

Examples

>>> params = policy.get_params()
set_params(params: Dict[str, Any]) None[source]

Set the parameters of the policy.

Parameters:

params (dict) – Dictionary of policy parameters to set.

Return type:

None

Examples

>>> policy.set_params(params)
train() None[source]

Set the policy to training mode.

Return type:

None

Examples

>>> policy.train()
eval() None[source]

Set the policy to evaluation mode.

Return type:

None

Examples

>>> policy.eval()
class duo_ai.policies.LogitPolicy(config: LogitPolicyConfig, env: gym.Env)[source]

Bases: duo_ai.core.policy.Policy

Policy that selects actions based on logit confidence metrics and thresholds.

Examples

>>> policy = LogitPolicy(LogitPolicyConfig(), env)
>>> obs = ...
>>> action = policy.act(obs)
config_cls
config
params
device
EXPERT
act(obs: Dict[str, Any], temperature: float | None = None) torch.Tensor[source]

Select actions based on confidence scores and threshold.

Parameters:
  • obs (dict) – Observation dictionary containing ‘novice_logits’.

  • temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of selected actions (expert or not) for the batch.

Return type:

torch.Tensor

Examples

>>> action = policy.act(obs)
compute_confidence(logits: torch.Tensor) torch.Tensor[source]

Compute confidence scores from logits using the configured metric.

Parameters:

logits (torch.Tensor) – Logits tensor from the policy.

Returns:

Confidence scores for each sample in the batch.

Return type:

torch.Tensor

Raises:

NotImplementedError – If the configured metric is not recognized.

Examples

>>> score = policy.compute_confidence(logits)
reset(done: numpy.ndarray) None[source]

Reset the policy state at episode boundaries.

Parameters:

done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.

Return type:

None

Examples

>>> policy.reset(done)
get_params() Dict[str, Any][source]

Get the current parameters of the policy.

Returns:

Dictionary of policy parameters.

Return type:

dict

Examples

>>> params = policy.get_params()
set_params(params: Dict[str, Any]) None[source]

Set the parameters of the policy.

Parameters:

params (dict) – Dictionary of policy parameters to set.

Return type:

None

Raises:

KeyError – If a parameter key is not recognized by the policy.

Examples

>>> policy.set_params({'threshold': 0.7})
train() None[source]

Set the policy to training mode.

Return type:

None

Examples

>>> policy.train()
eval() None[source]

Set the policy to evaluation mode.

Return type:

None

Examples

>>> policy.eval()
class duo_ai.policies.PPOPolicy(config: PPOPolicyConfig, env: gym.Env)[source]

Bases: duo_ai.core.policy.Policy

Policy class for PPO, wrapping a model and providing action selection and parameter management.

Examples

>>> policy = PPOPolicy(PPOPolicyConfig(), env)
>>> obs = ...
>>> action = policy.act(obs)
config_cls
model
config
reset(done: numpy.ndarray) None[source]

Reset the policy state at episode boundaries.

Parameters:

done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.

Return type:

None

Examples

>>> policy.reset(done)
act(obs: Any, temperature: float = 1.0, return_model_output: bool = False) Any[source]

Select an action based on the observation and temperature.

Parameters:
  • obs (Any) – Observation input to the policy.

  • temperature (float, optional) – Sampling temperature. If 0, selects the argmax action. Default is 1.0.

  • return_model_output (bool, optional) – If True, also return the model output. Default is False.

Returns:

action – Selected action, or (action, model_output) if return_model_output is True.

Return type:

torch.Tensor or tuple

Examples

>>> action = policy.act(obs)
>>> action, model_output = policy.act(obs, return_model_output=True)
set_params(params: Dict[str, Any]) None[source]

Set the model parameters from a state dictionary.

Parameters:

params (dict) – State dictionary of model parameters.

Return type:

None

Examples

>>> policy.set_params(params)
get_params() Dict[str, Any][source]

Get the current model parameters as a state dictionary.

Returns:

State dictionary of model parameters.

Return type:

dict

Examples

>>> params = policy.get_params()
train() None[source]

Set the policy/model to training mode.

Return type:

None

Examples

>>> policy.train()
eval() None[source]

Set the policy/model to evaluation mode.

Return type:

None

Examples

>>> policy.eval()
class duo_ai.policies.PyODPolicy(config: PyODPolicyConfig, env: gym.Env)[source]

Bases: duo_ai.core.policy.Policy

Policy that uses a PyOD outlier detector for action selection based on OOD scores.

Examples

>>> policy = PyODPolicy(PyODPolicyConfig(), env)
>>> obs = ...
>>> action = policy.act(obs)
config_cls
config
threshold = None
device
clf
feature_type
EXPERT
_get_pyod_class(config: PyODPolicyConfig) type[source]

Dynamically import and return the PyOD class specified in the config.

Parameters:

config (PyODPolicyConfig) – Configuration object for the policy.

Returns:

The PyOD class to instantiate.

Return type:

type

Raises:

ImportError – If the specified class cannot be imported.

Examples

>>> cls = policy._get_pyod_class(config)
reset(done: numpy.ndarray) None[source]

Reset the policy state at episode boundaries.

Parameters:

done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.

Return type:

None

Examples

>>> policy.reset(done)
_make_input(obs: Dict[str, Any]) numpy.ndarray[source]

Construct the input feature array for the PyOD model from the observation.

Parameters:

obs (dict) – Observation dictionary containing required features.

Returns:

Concatenated feature array for the PyOD model.

Return type:

np.ndarray

Raises:

AssertionError – If no features are selected for PyOD input.

Examples

>>> inp = policy._make_input(obs)
fit(data: Dict[str, Any]) None[source]

Fit the PyOD model using the provided data.

Parameters:

data (dict) – Data dictionary containing features for fitting the model.

Return type:

None

Examples

>>> policy.fit(data)
get_train_scores() numpy.ndarray[source]

Get the OOD decision scores from the PyOD model after fitting.

Returns:

Array of decision scores for the training data.

Return type:

np.ndarray

Examples

>>> scores = policy.get_train_scores()
act(obs: Dict[str, Any], temperature: float | None = None) torch.Tensor[source]

Select actions based on OOD scores from the PyOD model.

Parameters:
  • obs (dict) – Observation dictionary containing required features.

  • temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of selected actions (expert or not) for the batch.

Return type:

torch.Tensor

Examples

>>> action = policy.act(obs)
set_params(params: Dict[str, Any]) None[source]

Set the parameters of the policy.

Parameters:

params (dict) – Dictionary of policy parameters to set.

Return type:

None

Examples

>>> policy.set_params({'threshold': 0.5, 'clf': clf})
get_params() Dict[str, Any][source]

Get the current parameters of the policy.

Returns:

Dictionary of policy parameters.

Return type:

dict

Examples

>>> params = policy.get_params()
train() None[source]

Set the PyOD model to training mode if applicable.

Return type:

None

Examples

>>> policy.train()
eval() None[source]

Set the PyOD model to evaluation mode if applicable.

Return type:

None

Examples

>>> policy.eval()
class duo_ai.policies.RandomPolicy(config: RandomPolicyConfig, env: gym.Env)[source]

Bases: duo_ai.core.policy.Policy

Policy that selects the expert action with a fixed probability.

Examples

>>> policy = RandomPolicy(RandomPolicyConfig(prob=0.7), env)
>>> obs = ...
>>> action = policy.act(obs)
config_cls
prob
device
EXPERT
config
act(obs: object, temperature: float | None = None) torch.Tensor[source]

Select actions randomly based on the configured probability.

Parameters:
  • obs (dict or np.ndarray) – Batch of observations. If dict, must contain ‘base_obs’.

  • temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of selected actions (expert or not) for the batch.

Return type:

torch.Tensor

Raises:

ValueError – If obs is not a dict or numpy array.

Examples

>>> action = policy.act(obs)
reset(done: numpy.ndarray) None[source]

Reset the policy state at episode boundaries.

Parameters:

done (np.ndarray) – Boolean array indicating which episodes in a batch require a reset.

Return type:

None

Examples

>>> policy.reset(done)
set_params(params: dict) None[source]

Set the parameters of the policy.

Parameters:

params (dict) – Dictionary of policy parameters to set.

Return type:

None

Examples

>>> policy.set_params({'prob': 0.5})
get_params() dict[source]

Get the current parameters of the policy.

Returns:

Dictionary of policy parameters.

Return type:

dict

Examples

>>> params = policy.get_params()
train() None[source]

Set the policy to training mode.

Return type:

None

Examples

>>> policy.train()
eval() None[source]

Set the policy to evaluation mode.

Return type:

None

Examples

>>> policy.eval()
duo_ai.policies.registry