Principle

Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is log(p), where p = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about logprobs:

  • Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model’s confidence in its output or explore alternative responses the model considered.
  • Logprob can be any negative number or 0.00.0 corresponds to 100% probability.
  • Logprobs allow us to compute the joint probability of a sequence as the sum of the logprobs of the individual tokens. This is useful for scoring and ranking model outputs. Another common approach is to take the average per-token logprob of a sentence to choose the best generation.
  • We can examine the logprobs assigned to different candidate tokens to understand what options the model considered plausible or implausible.

While there are a wide array of use cases for logprobs, this notebook will focus on its use for:

  1. Classification tasks
  • Large Language Models excel at many classification tasks, but accurately measuring the model’s confidence in its outputs can be challenging. logprobs provide a probability associated with each class prediction, enabling users to set their own classification or confidence thresholds.
  1. Retrieval (Q&A) evaluation
  • logprobs can assist with self-evaluation in retrieval applications. In the Q&A example, the model outputs a contrived has_sufficient_context_for_answer boolean, which can serve as a confidence score of whether the answer is contained in the retrieved content. Evaluations of this type can reduce retrieval-based hallucinations and enhance accuracy.
  1. Autocomplete
  • logprobs could help us decide how to suggest words as a user is typing.
  1. Token highlighting and outputting bytes
  • Users can easily create a token highlighter using the built in tokenization that comes with enabling logprobs. Additionally, the bytes parameter includes the ASCII encoding of each output character, which is particularly useful for reproducing emojis and special characters.
  1. Calculating perplexity
  • logprobs can be used to help us assess the model’s overall confidence in a result and help us compare the confidence of results from different prompts.

API

Parameters

The relevant request parameters are:

  • logprobs: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This option is currently not available on the gpt-4-vision-preview model.
  • top_logprobs: An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.