Intro
The API has several parameters, some of which are documented at the moment, others of which are not
Completion API
Documented Parameters
engine
Engine (a string) selects which of the GPT models to use. There's four models, named after famous mathematicians.
The engines are named in increasing power by alphabetical order: Ada, Babbage, Curie, Davinci
More powerful engines provide better results, but take longer to run.
max_tokens
Max_tokens (an int) is the maximum number of tokens the API will produce.
Tokens are either letters or combinations of letters. It has to do with how the model was trained; basically instead of trying to predict the next letter or next word, the model was trained to predict the next token. The sum of the number of tokens in the prompt + max_tokens cannot exceed 2048 or it'll send an error.
prompt
Prompt (a string) is the text that the API will try to predict the next token from
Prompt can also take an int if you want to turn the text into tokens before sending to the API. At the current time, the API uses the same encoding as GPT-2. The encoding is available from huggingface, which can be accessed in Python via:
!pip install transformers
from transformers.tokenization_gpt2 import GPT2TokenizerFast as GPT2Tokenizer
encoding = GPT2Tokenizer.from_pretrained("gpt2-xl")
tokens = encoding("YOUR TEXT")
to get the number of tokens, you can just do
len(tokens["input_ids"])
temperature
Temperature is a float (0-2). Basically, the API assigns probabilities to whether or not a token appears next. If temperature is 0, the API will select the most likely token. With higher values of temperature, the API will return tokens of lower probabilty.
E.g. suppose the model assigns a probability of 'A' appearing next to .8 and 'B' to .2. With temperature 0, it will always pick A. Higher temperatures increase the likelihood that it still selects 'B' instead. This matters a lot if max_tokens is high for how random/creative the response can be.
top_p
Uses nucleus sampling instead of the probabilities from temperature.
n
n is an int. It's the number of choices to return; useless at temp=0, but if you're introducing variance by selecting lower probability tokens you can get different results from the same prompt.
stream
stream is a boolean. If false, sends down tokens ass they're completed. Otherwise, sends back batch.
logprobs
logprobs takes an int (default None or 0?). It'll send back the token + logprob for the most likely tokens regardless of what it actually selects. (If temp=0, first logprob will be selected token).
stop
stop takes a string. When the API would generate the stop token, it terminates and returns regardless of max_tokens.
Undocumented Parameters
frequency_penalty
frequency_penalty takes a float. Lowers the probabilities of tokens based on how often they occur.
presence_penalty
presence_penalty takes a float. Lowers the probabilities of tokens if they've previously occurred at all.
echo
echo takes a boolean. Returns logprobs for the prompt. Logprobs must be on.
Playground Params
start sequence
start sequence automatically prepends text to the prompt. You can emulate this by just adding it yourself before sending the prompt.
restart sequence
restart sequence automatically appends text to the prompt. E.g. when you do q/a, it'll automatically add the '\na:". You can just add that before sending the prompt on the client side.