Logit Algorithms ============= The novice computes a confidence score based on its output logits. It issues a help request whenever the score is below a threshold. **Notation:** - :math:`z = (z_1, \dots, z_{|A|})` are the logits computed by the novice. - :math:`p = \mathrm{Softmax}(z)` is the probability vector. - :math:`p^{\downarrow}` denotes the elements of :math:`p` sorted in descending order. **Supported metrics:** - ``max_logit``: The maximum logit value :math:`\max_i z_i` - ``max_prob`` [1]_: The maximum probability :math:`\max_i p_i` - ``margin`` [2]_: The difference between the highest and second-highest probabilities :math:`p_1^{\downarrow} - p_2^{\downarrow}` - ``entropy`` [3]_: The negative entropy of the action distribution :math:`\sum_i p_i \ln p_i` - ``energy`` [4]_: The log-sum-exp of the logits :math:`\ln \sum_i \exp(z_i)` A challenge in this approach is determining the appropriate threshold. We address this by proposing the following adaptive procedure: 1. **Exploration:** Use the novice to explore the training environment, generating a set of states :math:`\mathcal{S}_{\text{train}}`. 2. **Score Computation:** For each state :math:`s \in \mathcal{S}_{\text{train}}`, compute its confidence score :math:`c(s)`. This results in a pool of confidence scores :math:`\mathcal{C} = \{c(s) \mid s \in \mathcal{S}_{\text{train}}\}`. 3. **Threshold Selection:** Consider the :math:`n`-th percentiles of :math:`\mathcal{C}` as candidate thresholds (:math:`n = 0, 10,..., 100`). 4. **Validation:** For each candidate threshold, construct a policy and evaluate its performance on the validation tasks. 5. **Test-Time Selection:** Select the policy that yields the best validation performance and use it during testing. References ---------- .. [1] David D. Lewis. "A sequential algorithm for training text classifiers: Corrigendum and additional data." *Acm Sigir Forum*, 29:13–19, 1995. .. [2] Burr Settles. "Active learning literature survey." Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 2009. .. [3] Tobias Scheffer, Christian Decomain, and Stefan Wrobel. "Active hidden markov models for information extraction." In *International symposium on intelligent data analysis*, pages 309–318. Springer, 2001. .. [4] Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. "Energy-based out-of-distribution detection." *Advances in Neural Information Processing Systems*, 33:21464–21475, 2020.