Related:
Introduction
This page is a general introduction to the OpenAI API.
Background
The OpenAI API is powered by a GPT model. There've been quite a few variations. I think OpenAI wants their API to be called the OpenAI API or the API, although they haven' really branded it. However, because there's lots of different APIs on the internet, and as just mentioned they didn't brand their API, everyone just calls it GPT (or GPT-3 which is the most recent version they wrote about).
Playground
Playground is a website OpenAI set up that lets you stick in text and let the API autocomplete whatever you started with. The part that it autocompletes is the prompt, which is also called the context.
Playground has several options to change.
Temperature - randomness of output
Max Tokens - maximum number of tokens to generate
Frequency Penalty - Prevent repetition based on token frequency
Presence Penalty - Prevent repetition based on if a token's appeared previously
start - what to prepend input with
restart - what to append to the prompt
There's been a lot of people playing with it getting interesting results.
Broad Strokes Picture of How GPT Crunches Text
GPT's a neural network. In middle school algebra, you've got an equation for a line that's y=mx +b, where m is the slope of a line, x is a variable, b is the intercept, and y is the value spat out by the equation.
For instance, if we have y = 3x + 1, if x =1, then y = 3*1 + 1 = 3+1 = 4.
A neural network is essentially a bunch of those chained together, where you've got y = m1x1 + m2x2 + m3x3 + … mn*xn. It's a bit more complex because you're stacking a bunch of these on top of each other. Anyway, the way we can talk about this big equation is we've got
- the weights: [m1, m2, m3, … mn]
- the inputs: [x1, x2, x3, …, mn]
- the output: Y
When you feed a prompt to the API, the prompt is basically turned into the [x1, x2, … xn]. Some computer models change entire words into those vectors (e.g. word2vec); others might do it by character. For GPT, it uses a variable size token depending on context; "hello" might be converted into ["hel", "lo"], ["hello"], or ["hello"] depending on the context. These tokens are then used to construct the inputs; for instance, "Hello, my name is bob" might be [1,1,5,… 0]. This is then multiplied by all the weights, so you have 1*m1 + 1 * m2 + 5*m3 + … 0*mn, and then this spits out our y. Chaning the input changes y.
Now, this is a bit more complicated because the output Y is actually a distribution of probabilities [y1, y2, y3, y4], where the sum of the y's adds up to 1. At temperature 0, you select the y with the highest probability to be the next token. At higher temperatures, you select non-highest probability y's.
Likewise, you can penalize the probabilities of tokens if they've already appeared in the prompt. You can penalize if they've appeared at all (the presence_penalty) or you can penalize them by how often they've appeared (frequency_penalty). For instance, consider [a, a, b, a, a, a, a]. Suppose the output probabilities are:
a=.5
b=.3
c =.2
With no penalty, 'a' has the highest value.
If we penalize everything if it's occurred at all by say, .4, then you'll get
a = .1
b = -.1
c = .2
so selecting the most probable token is now 'c'.
However, if you penalize by frequency by say, .05, a is penalized by .3 (6 occurrences) and b by .05 (1 occurrence) resulting in:
a = .2
b = .25
c = .2
so now 'b' is the most probable next token.
Whenever you change the prompt, you're just changing all the x's; this won't change the future outputs of the model with a different prompt. If you want to change the input of the model for different prompts, you need to change the values of the 'm's, which is done via fine tuning the model.
When playing in playground, you're pretty restricted in choosing what the next token you get back is going to be, so using a programing language to manipulate text can be useful.
Programmatic Access
For programmatic access, you have access to the completion API which does the prediction of the next token described, but you also have access to the search API. The search API works like this: in order to predict the next character that's going to occur, the completion API has to have some sense of what the meaning is. So the search function just processes a completion request and reaches a couple layers into the network to get the general meaning of the sentence. Then it compares the two.
The problem with search is it isn't immediately clear what the value for comparison means. Generally 250's a good cutoff. However, I've had success doing a train/test split, finding the cutoff that optimizes the F1 score on the training set and validating that it works on the test set.
At any rate, programmatic access gives us access to a lot more options than Playground:
- You can generate and select from multiple options
- You can programmatically select from multiple options
- You can find optimized context for each query
- You can have GPT generate queries for itself