duo_ai.policies¶
Submodules¶
Attributes¶
Classes¶
Policy that always selects the same agent (novice or expert) for every action. |
|
Policy that selects actions based on logit confidence metrics and thresholds. |
|
Policy class for PPO, wrapping a model and providing action selection and parameter management. |
|
Policy that uses a PyOD outlier detector for action selection based on OOD scores. |
|
Policy that selects the expert action with a fixed probability. |
Package Contents¶
- class duo_ai.policies.AlwaysPolicy(config: AlwaysPolicyConfig, env: gym.Env)[source]¶
Bases:
duo_ai.core.policy.PolicyPolicy that always selects the same agent (novice or expert) for every action.
Examples
>>> policy = AlwaysPolicy(AlwaysPolicyConfig(agent="novice"), env) >>> obs = ... >>> action = policy.act(obs)
- config_cls¶
- choice¶
- device¶
- config¶
- act(obs: Any, temperature: float | None = None) torch.Tensor[source]¶
Select the constant action for a batch of observations.
- Parameters:
obs (dict or np.ndarray) – Batch of observations. If dict, must contain ‘base_obs’.
temperature (float, optional) – Unused. Included for API compatibility.
- Returns:
Tensor of constant actions (agent indices) for the batch.
- Return type:
torch.Tensor
- Raises:
ValueError – If obs is not a dict or numpy array.
Examples
>>> action = policy.act(obs)
- reset(done: numpy.ndarray) None[source]¶
Reset the policy state at episode boundaries.
- Parameters:
done (np.ndarray) – Boolean array indicating which episodes in a batch require a reset.
- Return type:
None
Examples
>>> policy.reset(done)
- get_params() Dict[str, Any][source]¶
Get the current parameters of the policy.
- Returns:
Dictionary of policy parameters.
- Return type:
dict
Examples
>>> params = policy.get_params()
- set_params(params: Dict[str, Any]) None[source]¶
Set the parameters of the policy.
- Parameters:
params (dict) – Dictionary of policy parameters to set.
- Return type:
None
Examples
>>> policy.set_params(params)
- class duo_ai.policies.LogitPolicy(config: LogitPolicyConfig, env: gym.Env)[source]¶
Bases:
duo_ai.core.policy.PolicyPolicy that selects actions based on logit confidence metrics and thresholds.
Examples
>>> policy = LogitPolicy(LogitPolicyConfig(), env) >>> obs = ... >>> action = policy.act(obs)
- config_cls¶
- config¶
- params¶
- device¶
- EXPERT¶
- act(obs: Dict[str, Any], temperature: float | None = None) torch.Tensor[source]¶
Select actions based on confidence scores and threshold.
- Parameters:
obs (dict) – Observation dictionary containing ‘novice_logits’.
temperature (float, optional) – Unused. Included for API compatibility.
- Returns:
Tensor of selected actions (expert or not) for the batch.
- Return type:
torch.Tensor
Examples
>>> action = policy.act(obs)
- compute_confidence(logits: torch.Tensor) torch.Tensor[source]¶
Compute confidence scores from logits using the configured metric.
- Parameters:
logits (torch.Tensor) – Logits tensor from the policy.
- Returns:
Confidence scores for each sample in the batch.
- Return type:
torch.Tensor
- Raises:
NotImplementedError – If the configured metric is not recognized.
Examples
>>> score = policy.compute_confidence(logits)
- reset(done: numpy.ndarray) None[source]¶
Reset the policy state at episode boundaries.
- Parameters:
done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.
- Return type:
None
Examples
>>> policy.reset(done)
- get_params() Dict[str, Any][source]¶
Get the current parameters of the policy.
- Returns:
Dictionary of policy parameters.
- Return type:
dict
Examples
>>> params = policy.get_params()
- set_params(params: Dict[str, Any]) None[source]¶
Set the parameters of the policy.
- Parameters:
params (dict) – Dictionary of policy parameters to set.
- Return type:
None
- Raises:
KeyError – If a parameter key is not recognized by the policy.
Examples
>>> policy.set_params({'threshold': 0.7})
- class duo_ai.policies.PPOPolicy(config: PPOPolicyConfig, env: gym.Env)[source]¶
Bases:
duo_ai.core.policy.PolicyPolicy class for PPO, wrapping a model and providing action selection and parameter management.
Examples
>>> policy = PPOPolicy(PPOPolicyConfig(), env) >>> obs = ... >>> action = policy.act(obs)
- config_cls¶
- model¶
- config¶
- reset(done: numpy.ndarray) None[source]¶
Reset the policy state at episode boundaries.
- Parameters:
done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.
- Return type:
None
Examples
>>> policy.reset(done)
- act(obs: Any, temperature: float = 1.0, return_model_output: bool = False) Any[source]¶
Select an action based on the observation and temperature.
- Parameters:
obs (Any) – Observation input to the policy.
temperature (float, optional) – Sampling temperature. If 0, selects the argmax action. Default is 1.0.
return_model_output (bool, optional) – If True, also return the model output. Default is False.
- Returns:
action – Selected action, or (action, model_output) if return_model_output is True.
- Return type:
torch.Tensor or tuple
Examples
>>> action = policy.act(obs) >>> action, model_output = policy.act(obs, return_model_output=True)
- set_params(params: Dict[str, Any]) None[source]¶
Set the model parameters from a state dictionary.
- Parameters:
params (dict) – State dictionary of model parameters.
- Return type:
None
Examples
>>> policy.set_params(params)
- get_params() Dict[str, Any][source]¶
Get the current model parameters as a state dictionary.
- Returns:
State dictionary of model parameters.
- Return type:
dict
Examples
>>> params = policy.get_params()
- class duo_ai.policies.PyODPolicy(config: PyODPolicyConfig, env: gym.Env)[source]¶
Bases:
duo_ai.core.policy.PolicyPolicy that uses a PyOD outlier detector for action selection based on OOD scores.
Examples
>>> policy = PyODPolicy(PyODPolicyConfig(), env) >>> obs = ... >>> action = policy.act(obs)
- config_cls¶
- config¶
- threshold = None¶
- device¶
- clf¶
- feature_type¶
- EXPERT¶
- _get_pyod_class(config: PyODPolicyConfig) type[source]¶
Dynamically import and return the PyOD class specified in the config.
- Parameters:
config (PyODPolicyConfig) – Configuration object for the policy.
- Returns:
The PyOD class to instantiate.
- Return type:
type
- Raises:
ImportError – If the specified class cannot be imported.
Examples
>>> cls = policy._get_pyod_class(config)
- reset(done: numpy.ndarray) None[source]¶
Reset the policy state at episode boundaries.
- Parameters:
done (numpy.ndarray) – Boolean array indicating which episodes in a batch require a reset.
- Return type:
None
Examples
>>> policy.reset(done)
- _make_input(obs: Dict[str, Any]) numpy.ndarray[source]¶
Construct the input feature array for the PyOD model from the observation.
- Parameters:
obs (dict) – Observation dictionary containing required features.
- Returns:
Concatenated feature array for the PyOD model.
- Return type:
np.ndarray
- Raises:
AssertionError – If no features are selected for PyOD input.
Examples
>>> inp = policy._make_input(obs)
- fit(data: Dict[str, Any]) None[source]¶
Fit the PyOD model using the provided data.
- Parameters:
data (dict) – Data dictionary containing features for fitting the model.
- Return type:
None
Examples
>>> policy.fit(data)
- get_train_scores() numpy.ndarray[source]¶
Get the OOD decision scores from the PyOD model after fitting.
- Returns:
Array of decision scores for the training data.
- Return type:
np.ndarray
Examples
>>> scores = policy.get_train_scores()
- act(obs: Dict[str, Any], temperature: float | None = None) torch.Tensor[source]¶
Select actions based on OOD scores from the PyOD model.
- Parameters:
obs (dict) – Observation dictionary containing required features.
temperature (float, optional) – Unused. Included for API compatibility.
- Returns:
Tensor of selected actions (expert or not) for the batch.
- Return type:
torch.Tensor
Examples
>>> action = policy.act(obs)
- set_params(params: Dict[str, Any]) None[source]¶
Set the parameters of the policy.
- Parameters:
params (dict) – Dictionary of policy parameters to set.
- Return type:
None
Examples
>>> policy.set_params({'threshold': 0.5, 'clf': clf})
- get_params() Dict[str, Any][source]¶
Get the current parameters of the policy.
- Returns:
Dictionary of policy parameters.
- Return type:
dict
Examples
>>> params = policy.get_params()
- class duo_ai.policies.RandomPolicy(config: RandomPolicyConfig, env: gym.Env)[source]¶
Bases:
duo_ai.core.policy.PolicyPolicy that selects the expert action with a fixed probability.
Examples
>>> policy = RandomPolicy(RandomPolicyConfig(prob=0.7), env) >>> obs = ... >>> action = policy.act(obs)
- config_cls¶
- prob¶
- device¶
- EXPERT¶
- config¶
- act(obs: object, temperature: float | None = None) torch.Tensor[source]¶
Select actions randomly based on the configured probability.
- Parameters:
obs (dict or np.ndarray) – Batch of observations. If dict, must contain ‘base_obs’.
temperature (float, optional) – Unused. Included for API compatibility.
- Returns:
Tensor of selected actions (expert or not) for the batch.
- Return type:
torch.Tensor
- Raises:
ValueError – If obs is not a dict or numpy array.
Examples
>>> action = policy.act(obs)
- reset(done: numpy.ndarray) None[source]¶
Reset the policy state at episode boundaries.
- Parameters:
done (np.ndarray) – Boolean array indicating which episodes in a batch require a reset.
- Return type:
None
Examples
>>> policy.reset(done)
- set_params(params: dict) None[source]¶
Set the parameters of the policy.
- Parameters:
params (dict) – Dictionary of policy parameters to set.
- Return type:
None
Examples
>>> policy.set_params({'prob': 0.5})
- get_params() dict[source]¶
Get the current parameters of the policy.
- Returns:
Dictionary of policy parameters.
- Return type:
dict
Examples
>>> params = policy.get_params()
- duo_ai.policies.registry¶