Api Parameters


The API has several parameters, some of which are documented at the moment, others of which are not

Completion API

Documented Parameters


Engine (a string) selects which of the GPT models to use. There's four models, named after famous mathematicians.

The engines are named in increasing power by alphabetical order: Ada, Babbage, Curie, Davinci

More powerful engines provide better results, but take longer to run.


Max_tokens (an int) is the maximum number of tokens the API will produce.

Tokens are either letters or combinations of letters. It has to do with how the model was trained; basically instead of trying to predict the next letter or next word, the model was trained to predict the next token. The sum of the number of tokens in the prompt + max_tokens cannot exceed 2048 or it'll send an error.


Prompt (a string) is the text that the API will try to predict the next token from

Prompt can also take an int if you want to turn the text into tokens before sending to the API. At the current time, the API uses the same encoding as GPT-2. The encoding is available from huggingface, which can be accessed in Python via:

!pip install transformers

from transformers.tokenization_gpt2 import GPT2TokenizerFast as GPT2Tokenizer
encoding = GPT2Tokenizer.from_pretrained("gpt2-xl")

tokens = encoding("YOUR TEXT")

to get the number of tokens, you can just do



Temperature is a float (0-2). Basically, the API assigns probabilities to whether or not a token appears next. If temperature is 0, the API will select the most likely token. With higher values of temperature, the API will return tokens of lower probabilty.

E.g. suppose the model assigns a probability of 'A' appearing next to .8 and 'B' to .2. With temperature 0, it will always pick A. Higher temperatures increase the likelihood that it still selects 'B' instead. This matters a lot if max_tokens is high for how random/creative the response can be.


Uses nucleus sampling instead of the probabilities from temperature.


n is an int. It's the number of choices to return; useless at temp=0, but if you're introducing variance by selecting lower probability tokens you can get different results from the same prompt.


stream is a boolean. If false, sends down tokens ass they're completed. Otherwise, sends back batch.


logprobs takes an int (default None or 0?). It'll send back the token + logprob for the most likely tokens regardless of what it actually selects. (If temp=0, first logprob will be selected token).


stop takes a string. When the API would generate the stop token, it terminates and returns regardless of max_tokens.

Undocumented Parameters


frequency_penalty takes a float. Lowers the probabilities of tokens based on how often they occur.


presence_penalty takes a float. Lowers the probabilities of tokens if they've previously occurred at all.


echo takes a boolean. Returns logprobs for the prompt. Logprobs must be on.


Playground Params

start sequence

start sequence automatically prepends text to the prompt. You can emulate this by just adding it yourself before sending the prompt.

restart sequence

restart sequence automatically appends text to the prompt. E.g. when you do q/a, it'll automatically add the '\na:". You can just add that before sending the prompt on the client side.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License