duo_ai.policies¶

Submodules¶

Attributes¶

registry

Classes¶

`AlwaysPolicy`	Policy that always selects the same agent (novice or expert) for every action.
`LogitPolicy`	Policy that selects actions based on logit confidence metrics and thresholds.
`PPOPolicy`	Policy class for PPO, wrapping a model and providing action selection and parameter management.
`PyODPolicy`	Policy that uses a PyOD outlier detector for action selection based on OOD scores.
`RandomPolicy`	Policy that selects the expert action with a fixed probability.

Package Contents¶

class duo_ai.policies.AlwaysPolicy(config: AlwaysPolicyConfig, env: gym.Env)[source]¶

Bases: duo_ai.core.policy.Policy

Policy that always selects the same agent (novice or expert) for every action.

Examples

>>> policy = AlwaysPolicy(AlwaysPolicyConfig(agent="novice"), env)
>>> obs = ...
>>> action = policy.act(obs)

config_cls¶

choice¶

device¶

config¶

act(obs: Any, temperature: float | None = None) → torch.Tensor[source]¶

Select the constant action for a batch of observations.

Parameters:

obs (dict or np.ndarray) – Batch of observations. If dict, must contain ‘base_obs’.
temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of constant actions (agent indices) for the batch.

Return type:

torch.Tensor

Raises:

ValueError – If obs is not a dict or numpy array.

Examples

>>> action = policy.act(obs)

reset(done: numpy.ndarray) → None[source]¶

Reset the policy state at episode boundaries.

Parameters:: done (np.ndarray) – Boolean array indicating which episodes in a batch require a reset.
Return type:: None

Examples

>>> policy.reset(done)

get_params() → Dict[str, Any][source]¶

Get the current parameters of the policy.

Returns:: Dictionary of policy parameters.
Return type:: dict

Examples

>>> params = policy.get_params()

set_params(params: Dict[str, Any]) → None[source]¶

Set the parameters of the policy.

Parameters:: params (dict) – Dictionary of policy parameters to set.
Return type:: None

Examples

>>> policy.set_params(params)

train() → None[source]¶

Set the policy to training mode.

Return type:: None

Examples

>>> policy.train()

eval() → None[source]¶

Set the policy to evaluation mode.

Return type:: None

Examples

>>> policy.eval()

class duo_ai.policies.LogitPolicy(config: LogitPolicyConfig, env: gym.Env)[source]¶

Bases: duo_ai.core.policy.Policy

Policy that selects actions based on logit confidence metrics and thresholds.

Examples

>>> policy = LogitPolicy(LogitPolicyConfig(), env)
>>> obs = ...
>>> action = policy.act(obs)

config_cls¶

config¶

params¶

device¶

EXPERT¶

act(obs: Dict[str, Any], temperature: float | None = None) → torch.Tensor[source]¶

Select actions based on confidence scores and threshold.

Parameters:

obs (dict) – Observation dictionary containing ‘novice_logits’.
temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of selected actions (expert or not) for the batch.

Return type:

torch.Tensor

Examples

>>> action = policy.act(obs)

compute_confidence(logits: torch.Tensor) → torch.Tensor[source]¶

Compute confidence scores from logits using the configured metric.

Parameters:: logits (torch.Tensor) – Logits tensor from the policy.
Returns:: Confidence scores for each sample in the batch.
Return type:: torch.Tensor
Raises:: NotImplementedError – If the configured metric is not recognized.

Examples

>>> score = policy.compute_confidence(logits)

reset(done: numpy.ndarray) → None[source]¶

Reset the policy state at episode boundaries.

Parameters:: done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.
Return type:: None

Examples

>>> policy.reset(done)

get_params() → Dict[str, Any][source]¶

Get the current parameters of the policy.

Returns:: Dictionary of policy parameters.
Return type:: dict

Examples

>>> params = policy.get_params()

set_params(params: Dict[str, Any]) → None[source]¶

Set the parameters of the policy.

Parameters:: params (dict) – Dictionary of policy parameters to set.
Return type:: None
Raises:: KeyError – If a parameter key is not recognized by the policy.

Examples

>>> policy.set_params({'threshold': 0.7})

train() → None[source]¶

Set the policy to training mode.

Return type:: None

Examples

>>> policy.train()

eval() → None[source]¶

Set the policy to evaluation mode.

Return type:: None

Examples

>>> policy.eval()

class duo_ai.policies.PPOPolicy(config: PPOPolicyConfig, env: gym.Env)[source]¶

Bases: duo_ai.core.policy.Policy

Policy class for PPO, wrapping a model and providing action selection and parameter management.

Examples

>>> policy = PPOPolicy(PPOPolicyConfig(), env)
>>> obs = ...
>>> action = policy.act(obs)

config_cls¶

model¶

config¶

reset(done: numpy.ndarray) → None[source]¶

Reset the policy state at episode boundaries.

Parameters:: done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.
Return type:: None

Examples

>>> policy.reset(done)

act(obs: Any, temperature: float = 1.0, return_model_output: bool = False) → Any[source]¶

Select an action based on the observation and temperature.

Parameters:

obs (Any) – Observation input to the policy.
temperature (float, optional) – Sampling temperature. If 0, selects the argmax action. Default is 1.0.
return_model_output (bool, optional) – If True, also return the model output. Default is False.

Returns:

action – Selected action, or (action, model_output) if return_model_output is True.

Return type:

torch.Tensor or tuple

Examples

>>> action = policy.act(obs)
>>> action, model_output = policy.act(obs, return_model_output=True)

set_params(params: Dict[str, Any]) → None[source]¶

Set the model parameters from a state dictionary.

Parameters:: params (dict) – State dictionary of model parameters.
Return type:: None

Examples

>>> policy.set_params(params)

get_params() → Dict[str, Any][source]¶

Get the current model parameters as a state dictionary.

Returns:: State dictionary of model parameters.
Return type:: dict

Examples

>>> params = policy.get_params()

train() → None[source]¶

Set the policy/model to training mode.

Return type:: None

Examples

>>> policy.train()

eval() → None[source]¶

Set the policy/model to evaluation mode.

Return type:: None

Examples

>>> policy.eval()

class duo_ai.policies.PyODPolicy(config: PyODPolicyConfig, env: gym.Env)[source]¶

Bases: duo_ai.core.policy.Policy

Policy that uses a PyOD outlier detector for action selection based on OOD scores.

Examples

>>> policy = PyODPolicy(PyODPolicyConfig(), env)
>>> obs = ...
>>> action = policy.act(obs)

config_cls¶

config¶

threshold = None¶

device¶

clf¶

feature_type¶

EXPERT¶

_get_pyod_class(config: PyODPolicyConfig) → type[source]¶

Dynamically import and return the PyOD class specified in the config.

Parameters:: config (PyODPolicyConfig) – Configuration object for the policy.
Returns:: The PyOD class to instantiate.
Return type:: type
Raises:: ImportError – If the specified class cannot be imported.

Examples

>>> cls = policy._get_pyod_class(config)

reset(done: numpy.ndarray) → None[source]¶

Reset the policy state at episode boundaries.

Parameters:: done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.
Return type:: None

Examples

>>> policy.reset(done)

_make_input(obs: Dict[str, Any]) → numpy.ndarray[source]¶

Construct the input feature array for the PyOD model from the observation.

Parameters:: obs (dict) – Observation dictionary containing required features.
Returns:: Concatenated feature array for the PyOD model.
Return type:: np.ndarray
Raises:: AssertionError – If no features are selected for PyOD input.

Examples

>>> inp = policy._make_input(obs)

fit(data: Dict[str, Any]) → None[source]¶

Fit the PyOD model using the provided data.

Parameters:: data (dict) – Data dictionary containing features for fitting the model.
Return type:: None

Examples

>>> policy.fit(data)

get_train_scores() → numpy.ndarray[source]¶

Get the OOD decision scores from the PyOD model after fitting.

Returns:: Array of decision scores for the training data.
Return type:: np.ndarray

Examples

>>> scores = policy.get_train_scores()

act(obs: Dict[str, Any], temperature: float | None = None) → torch.Tensor[source]¶

Select actions based on OOD scores from the PyOD model.

Parameters:

obs (dict) – Observation dictionary containing required features.
temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of selected actions (expert or not) for the batch.

Return type:

torch.Tensor

Examples

>>> action = policy.act(obs)

set_params(params: Dict[str, Any]) → None[source]¶

Set the parameters of the policy.

Parameters:: params (dict) – Dictionary of policy parameters to set.
Return type:: None

Examples

>>> policy.set_params({'threshold': 0.5, 'clf': clf})

get_params() → Dict[str, Any][source]¶

Get the current parameters of the policy.

Returns:: Dictionary of policy parameters.
Return type:: dict

Examples

>>> params = policy.get_params()

train() → None[source]¶

Set the PyOD model to training mode if applicable.

Return type:: None

Examples

>>> policy.train()

eval() → None[source]¶

Set the PyOD model to evaluation mode if applicable.

Return type:: None

Examples

>>> policy.eval()

class duo_ai.policies.RandomPolicy(config: RandomPolicyConfig, env: gym.Env)[source]¶

Bases: duo_ai.core.policy.Policy

Policy that selects the expert action with a fixed probability.

Examples

>>> policy = RandomPolicy(RandomPolicyConfig(prob=0.7), env)
>>> obs = ...
>>> action = policy.act(obs)

config_cls¶

prob¶

device¶

EXPERT¶

config¶

act(obs: object, temperature: float | None = None) → torch.Tensor[source]¶

Select actions randomly based on the configured probability.

Parameters:

obs (dict or np.ndarray) – Batch of observations. If dict, must contain ‘base_obs’.
temperature (float, optional) – Unused. Included for API compatibility.

Returns:

Tensor of selected actions (expert or not) for the batch.

Return type:

torch.Tensor

Raises:

ValueError – If obs is not a dict or numpy array.

Examples

>>> action = policy.act(obs)

reset(done: numpy.ndarray) → None[source]¶

Reset the policy state at episode boundaries.

Parameters:: done (np.ndarray) – Boolean array indicating which episodes in a batch require a reset.
Return type:: None

Examples

>>> policy.reset(done)

set_params(params: dict) → None[source]¶

Set the parameters of the policy.

Parameters:: params (dict) – Dictionary of policy parameters to set.
Return type:: None

Examples

>>> policy.set_params({'prob': 0.5})

get_params() → dict[source]¶

Get the current parameters of the policy.

Returns:: Dictionary of policy parameters.
Return type:: dict

Examples

>>> params = policy.get_params()

train() → None[source]¶

Set the policy to training mode.

Return type:: None

Examples

>>> policy.train()

eval() → None[source]¶

Set the policy to evaluation mode.

Return type:: None

Examples

>>> policy.eval()

duo_ai.policies.registry¶