#How do 'n' and 'best_of' generate completion candidates?

3 messages · Page 1 of 1 (latest)

torn lynx
#

How is the search among best_of for n completions conducted? Is it anything deeper than simply picking best_of (or n if best_of is not specified) start tokens, generating completions continuing from each of those, and then returning n of them?

(That can't be right since presumably the API can return several completions that start with the same tokens and only diverge later?)

Put another way: I get (I think) how the "best" completion is selected: buy picking the each token based on having the highest probability (conditioned on the tokens in the completion up to that point). But how is the "next best" completion selected?

#

For example, these two choices of completions returned from the same prompt (with n>1) diverge at the 9th token (indicated). Why there? (And why are the logprobs for the tokens they share slightly different?)

['{"\\n": -0.0037023025, "\\n\\n": -6.352316}',
 '{"\\n": -1.5448071e-05, "An": -12.615716}',
 '{"The": -1.3678426, "An": -1.439065}',
 '{" ascent": -0.0020665573, " Everest": -7.384125}',
 '{" of": -0.006254864, " up": -5.486502}',
 '{" Mt": -0.04762134, " Mount": -3.1148455}',
 '{".": -0.02164162, " Everest": -3.8457437}',
 '{" Everest": -0.001416925, "E": -6.607183}',
 '{" begins": -0.86504513, " is": -0.9239994}', # <<< "is"
 '{" an": -0.59111094, " a": -0.99656755}',
 '{" incredible": -1.2450883, " ardu": -2.0375211}',
 '{" feat": -1.5022807, " experience": -1.5351253}',
 '{".": -0.28406015, " that": -2.4506993}',
...
['{"\\n": -0.0037023025, "\\n\\n": -6.352316}',
 '{"\\n": -1.5448071e-05, "An": -12.616774}',
 '{"The": -1.366957, "An": -1.444462}',
 '{" ascent": -0.0020985708, " Everest": -7.3576765}',
 '{" of": -0.0062700394, " up": -5.4812703}',
 '{" Mt": -0.047802635, " Mount": -3.1111228}',
 '{".": -0.02178044, " Everest": -3.839409}',
 '{" Everest": -0.001404806, "E": -6.615321}',
 '{" begins": -0.8689396, " is": -0.9175273}',  # <<< "begins"
 '{" with": -0.29068145, " in": -2.401031}',  
 '{" a": -0.5824688, " an": -1.7426355}',
 '{" long": -1.2253876, " trek": -1.6134099}',
 '{" journey": -1.4322438, " trek": -1.450603}',
torn lynx
#

First it's important to recall (or clarify) a few things:
(1) Completions are selected by adding tokens one at a time based on their computed likelihoods (expressed as logprobs) . Crucially this is were the randomness of generated completions comes in: token selection is not deterministic (even though the logprobs for a given prompt are).
(2) Despite being fixed for a given prompt and output token, logprobs may vary slightly as a result of slight numerical inaccuracies accumulating in the numerous floating-point operations involved in computing them, so they may differ slightly between runs). This is not where the randomness of completions comes from.
(3) When n or best_of are provided as arguments you're effectively invoking the API max(best_of, n) times (with n and best_of omitted), to generate max(best_of, n) completions, from which n are selected.

With that in mind, here's how it works: The n selected are simply the ones for which the sum of the total logprobs is highest, i.e. the most probable overall completions (not the ones with the highest logprob "per token", as stated in the docs, whatever that would mean).