Logit Algorithms¶

The novice computes a confidence score based on its output logits. It issues a help request whenever the score is below a threshold.

Notation:

Supported metrics:

max_logit: The maximum logit value \(\max_i z_i\)
max_prob [1]: The maximum probability \(\max_i p_i\)
margin [2]: The difference between the highest and second-highest probabilities \(p_1^{\downarrow} - p_2^{\downarrow}\)
entropy [3]: The negative entropy of the action distribution \(\sum_i p_i \ln p_i\)
energy [4]: The log-sum-exp of the logits \(\ln \sum_i \exp(z_i)\)

A challenge in this approach is determining the appropriate threshold. We address this by proposing the following adaptive procedure:

Exploration: Use the novice to explore the training environment, generating a set of states \(\mathcal{S}_{\text{train}}\).
Score Computation: For each state \(s \in \mathcal{S}_{\text{train}}\), compute its confidence score \(c(s)\). This results in a pool of confidence scores \(\mathcal{C} = \{c(s) \mid s \in \mathcal{S}_{\text{train}}\}\).
Threshold Selection: Consider the \(n\)-th percentiles of \(\mathcal{C}\) as candidate thresholds (\(n = 0, 10,..., 100\)).
Validation: For each candidate threshold, construct a policy and evaluate its performance on the validation tasks.
Test-Time Selection: Select the policy that yields the best validation performance and use it during testing.

References¶