parent:
linguistics
Related:
Overview
Sometimes you want to know what a word means. These prompts get the meaning of a word, both on their own and in context. These examples use evaluations on the terms and contexts from the Word in Context (WIC) dev dataset although are generally applicable.
Code: https://gist.github.com/brockmanmatt/3561d2408ed42804ef4ca66f9945c143
Loading WiC Dataset
We'll import the API keys, set up a query method to save results, and download the wic dataset to start.
To download the dataset, you can use wget,
!wget https://pilehvar.github.io/wic/package/WiC_dataset.zip
Anyway, this does the initial setup.
import openai, json, pandas as pd, numpy as np, random, datetime
openai.api_key = "KEY GOES HERE"
def query(prompt, kwargs):
"""
wrapper for the API
"""
r = openai.Completion.create(prompt=prompt, **kwargs)["choices"][0]["text"].strip()
with open("{}.json".format(datetime.datetime.now().strftime("%Y%m%d%s")), "w") as fh:
json.dump({"prompt":prompt, "response":r}, fh, indent=4)
return r
import zipfile
with zipfile.ZipFile("WiC_dataset.zip","r") as zip_ref:
zip_ref.extractall(".")
Getting Train Data
The dataset has a train, dev, and test set. We'll start with the train set for now.
train = pd.read_csv("train/train.data.txt", sep='\t', header=None)
train.columns = ["target", "pos", "position", "context-1", "context-2"]
train_gold = pd.read_csv("train/train.gold.txt", sep='\t', header=None)
train_gold.columns = ["label"]
train = pd.concat([train_gold,train], axis=1)
Simple Definition
We'll use a small set of the last 5 samples in the train set to evaluate these; later we'll mess with dev but for now there's plenty of examples in train to just figure out what the API is giving us for different prompts. I want some True/False and since the last 5 are all True, I just go with the next last 5 which has a nice mix.
miniTest = train[train.pos=="N"][-10:-5].copy()
miniTest
label | target | pos | position | context-1 | context-2 |
F | motion | N | 5-3 | The cinema relies on apparent motion . | He made a motion to adjourn . |
T | night | N | 4-5 | It vanished into the night . | The cat disappeared into the night . |
F | air | N | 6-6 | He threw the ball into the air . | A smell of chemicals in the air . |
T | sign | N | 4-0 | Those clouds show little sign of raining soon . | Signs of disease are objective , whereas symptoms are subjective . |
T | bed | N | 8-4 | We added a new rosebush to our rose bed . | The gardener planted a bed of roses . |
Raw Word
We can just query what each word means. However, the point of the dataset is to show different words have different meanings in context.
def getDefinition(term):
prompt = """Q: What does {} mean?\nA:""".format(term)
kwargs = { "engine":"davinci", "temperature":0, "max_tokens":15, "stop":"\n", }
return query(prompt, kwargs)
for row in miniTest.iterrows():
miniTest.at[row[0], "def"] = getDefinition(row[1]["target"])
target | context-1 | context-2 | def |
motion | The cinema relies on apparent motion . | He made a motion to adjourn . | Motion is the change in position of an object over time. |
night | It vanished into the night . | The cat disappeared into the night . | Night is the time when the sun is not shining. |
air | He threw the ball into the air . | A smell of chemicals in the air . | Air is a measurement of how much space is between the top of the tire |
sign | Those clouds show little sign of raining soon . | Signs of disease are objective , whereas symptoms are subjective . | It means that the person is a member of the sign. |
bed | We added a new rosebush to our rose bed . | The gardener planted a bed of roses . | Bed is a slang term for a place where you sleep. |
Word with Context
We can get at the context of the word by including the contexts. This does better, but we can see sometimes it's saying a word's a noun or something instead of the definition.
def getDefinitionInContext(context, term):
prompt = """Context: {}
Q: What does {} mean?\nA:""".format(context, term)
kwargs = { "engine":"davinci", "temperature":0, "max_tokens":15, "stop":"\n", }
return query(prompt, kwargs)
for row in miniTest.iterrows():
miniTest.at[row[0], "context1Def"] = getDefinitionInContext(row[1]["context-1"], row[1]["target"])
miniTest.at[row[0], "context2Def"] = getDefinitionInContext(row[1]["context-2"], row[1]["target"])
target | label | context-1 | context-2 | context1Def | context2Def |
motion | F | The cinema relies on apparent motion . | He made a motion to adjourn . | The apparent movement of objects in a series of still images. | It means to ask for something. |
night | T | It vanished into the night . | The cat disappeared into the night . | It means the time between sunset and sunrise. | It's a noun. |
air | F | He threw the ball into the air . | A smell of chemicals in the air . | It means the sky. | Air is a gas. |
sign | T | Those clouds show little sign of raining soon . | Signs of disease are objective , whereas symptoms are subjective . | sign = a signal or indication | A sign is something that is observed and measured. |
bed | T | We added a new rosebush to our rose bed . | The gardener planted a bed of roses . | Bed is a noun. It is a place where you sleep. | A place where you sleep. |
Fewshot Word with Context
We can add some few-shot labeled examples by manually labeling a few examples from the training set
fewShot = "Provide the definition for each word in the context of the preceeding sentance\n\n"
for row in train[train.pos=="N"][:5].iterrows():
fewShot += "Context: {}\n".format(row[1]["context-1"])
fewShot += "Q: What is a {}?\nA:\n\n".format(row[1]["target"])
fewShot += "Context: {}\n".format(row[1]["context-2"])
fewShot += "Q: What is a {}?\nA:\n\n".format(row[1]["target"])
We can then take the output and modify it with the answers we want:
def getDefinitionInContextFewShot(context, term):
fewShot = """Provide the definition for each word in the context of the preceeding sentance
Context: He wore a jock strap with a metal cup .
Q: What does cup mean?
A: something that contains
Context: Bees filled the waxen cups with honey .
Q: What does cup mean?
A: something that contains
Context: The Academy of Music .
Q: What does academy mean?
A: a place
Context: The French Academy .
Q: What does academy mean?
A: a society
Context: He got clearance to travel to America , even though he had previous links to terrorists .
Q: What does clearance mean?
A: permission
Context: The plane got clearance from air traffic control , and we were off .
Q: What does clearance mean?
A: permission
Context: Before laying sod on that clay , the ground needs two inches of coverage with topsoil .
Q: What does coverage mean?
A: the thing covering
Context: The dictionary 's coverage of standard English is excellent .
Q: What does coverage mean?
A: the thing covered
Context: Her death came as a terrible shock .
Q: What does death mean?
A: an event in time
Context: He had two deaths on his conscience .
Q: What does death mean?
A: an act
"""
prompt = """Context: {}
Q: What does {} mean?\nA:""".format(context, term)
kwargs = { "engine":"davinci", "temperature":0, "max_tokens":15, "stop":"\n", }
return query(fewShot + prompt, kwargs)
This has some issues, where we can it gets 2/5 wrong; it gets the same wording for one same-use definitions and one different-use definition.
for row in miniTest.iterrows():
miniTest.at[row[0], "context1Def"] = getDefinitionInContextFewShot(row[1]["context-1"], row[1]["target"])
miniTest.at[row[0], "context2Def"] = getDefinitionInContextFewShot(row[1]["context-2"], row[1]["target"])
target | label | context-1 | context-2 | context1Def | context2Def |
motion | F | The cinema relies on apparent motion . | He made a motion to adjourn . | a change in position | a proposal |
night | T | It vanished into the night . | The cat disappeared into the night . | a period of time | a period of time |
air | F | He threw the ball into the air . | A smell of chemicals in the air . | the substance around us | the substance around us |
sign | T | Those clouds show little sign of raining soon . | Signs of disease are objective , whereas symptoms are subjective . | a symbol | a signal |
bed | T | We added a new rosebush to our rose bed . | The gardener planted a bed of roses . | a place | a place |
Few Shot Definition 2
We can switch up the wording and see how that changes it.
def getDefinitionInContextFewShot(context, term):
fewShot = """Provide the definition for each word in the context of the preceeding sentance
Context: He wore a jock strap with a metal cup .
Q: What is the sense of 'cup'?
A: something that contains
Context: Bees filled the waxen cups with honey .
Q: What is the sense of 'cup'?
A: something that contains
Context: The Academy of Music .
Q: What is the sense of 'academy'?
A: a place
Context: The French Academy .
Q: What is the sense of 'academy'?
A: a society
Context: He got clearance to travel to America , even though he had previous links to terrorists .
Q: What is the sense of 'clearance'?
A: permission
Context: The plane got clearance from air traffic control , and we were off .
Q: What is the sense of 'clearance'?
A: permission
Context: Before laying sod on that clay , the ground needs two inches of coverage with topsoil .
Q: What is the sense of 'coverage'?
A: the thing covering
Context: The dictionary 's coverage of standard English is excellent .
Q: What is the sense of 'coverage'?
A: the thing covered
Context: Her death came as a terrible shock .
Q: What is the sense of 'death'?
A: an event in time
Context: He had two deaths on his conscience .
Q: What is the sense of 'death'?
A: an act
"""
prompt = """Context: {}
Q: What is the sense of '{}'?\nA:""".format(context, term)
kwargs = { "engine":"davinci", "temperature":0, "max_tokens":15, "stop":"\n", }
return query(fewShot + prompt, kwargs)
for row in miniTest.iterrows():
miniTest.at[row[0], "context1Def"] = getDefinitionInContextFewShot(row[1]["context-1"], row[1]["target"])
miniTest.at[row[0], "context2Def"] = getDefinitionInContextFewShot(row[1]["context-2"], row[1]["target"])
This gives us
target | label | context-1 | context-2 | context1Def | context2Def |
motion | F | The cinema relies on apparent motion . | He made a motion to adjourn . | a change | a proposal |
night | T | It vanished into the night . | The cat disappeared into the night . | a period of time | a period of time |
air | F | He threw the ball into the air . | A smell of chemicals in the air . | the space above the ground | the stuff around us |
sign | T | Those clouds show little sign of raining soon . | Signs of disease are objective , whereas symptoms are subjective . | a signal | a mark |
bed | T | We added a new rosebush to our rose bed . | The gardener planted a bed of roses . | a place | a place |
Now it only has one wrong! We can try improving this by context-stuffing the first definition into the second
Context Stuffing with Few Shot
To add context stuffing from the few shots we just modify the previous method for generating few shot to also stuff in the meaning of the other sentence; context-1 will get context-2 labeled and context-2 will get context-1 labeled as examples.
def getDefinitionInContextFewShotStuft(context, term, stuffing):
fewShot = """Provide the definition for each word in the context of the preceeding sentance
Context: He wore a jock strap with a metal cup .
Q: What is the sense of 'cup'?
A: something that contains
Context: Bees filled the waxen cups with honey .
Q: What is the sense of 'cup'?
A: something that contains
Context: The Academy of Music .
Q: What is the sense of 'academy'?
A: a place
Context: The French Academy .
Q: What is the sense of 'academy'?
A: a society
Context: He got clearance to travel to America , even though he had previous links to terrorists .
Q: What is the sense of 'clearance'?
A: permission
Context: The plane got clearance from air traffic control , and we were off .
Q: What is the sense of 'clearance'?
A: permission
Context: Before laying sod on that clay , the ground needs two inches of coverage with topsoil .
Q: What is the sense of 'coverage'?
A: the thing covering
Context: The dictionary 's coverage of standard English is excellent .
Q: What is the sense of 'coverage'?
A: the thing covered
Context: Her death came as a terrible shock .
Q: What is the sense of 'death'?
A: an event in time
Context: He had two deaths on his conscience .
Q: What is the sense of 'death'?
A: an act
"""
contextStuffing = ""
for stuff in stuffing:
contextStuffing += "Context: {}\nQ: What is the sense of '{}'\nA: {}\n\n".format(stuff["context"], stuff["target"], stuff["sense"])
prompt = """Context: {}
Q: What is the sense of '{}'?\nA:""".format(context, term)
kwargs = { "engine":"davinci", "temperature":0, "max_tokens":15, "stop":"\n", }
return query(fewShot + contextStuffing + prompt, kwargs)
for row in miniTest.iterrows():
miniTest.at[row[0], "context1StuftDef"] = getDefinitionInContextFewShotStuft(row[1]["context-1"], row[1]["target"], [{"context": row[1]["context-2"], "target":row[1]["target"], "sense":row[1]["context2Def"]}])
miniTest.at[row[0], "context2StuftDef"] = getDefinitionInContextFewShotStuft(row[1]["context-2"], row[1]["target"], [{"context": row[1]["context-1"], "target":row[1]["target"], "sense":row[1]["context1Def"]}])
This does better, getting all of the examples correct.
target | label | context-1 | context-2 | context1StuftDef | context2StuftDef |
motion | F | The cinema relies on apparent motion . | He made a motion to adjourn . | a proposal | a proposal |
night | T | It vanished into the night . | The cat disappeared into the night . | a period of time | a period of time |
air | F | He threw the ball into the air . | A smell of chemicals in the air . | the stuff around us | the space above the ground |
sign | T | Those clouds show little sign of raining soon . | Signs of disease are objective , whereas symptoms are subjective . | a signal | a signal |
bed | T | We added a new rosebush to our rose bed . | The gardener planted a bed of roses . | a place | a place |
Evaluate on Dev:
Now we'll go evaluate on the dev set! We only trained this on nouns, but let's see how it does on verbs as well. We'll just run on the first 100 but could check the whole thing if we wanted.
We first load in the dev set
dev = pd.read_csv("dev/dev.data.txt", sep='\t', header=None)
dev.columns = ["target", "pos", "position", "context-1", "context-2"]
dev_gold = pd.read_csv("dev/dev.gold.txt", sep='\t', header=None)
dev_gold.columns = ["label"]
dev = pd.concat([dev_gold,dev], axis=1)
We'll then run both few-shot methods above:
devResults = {}
for row in dev[:100].iterrows():
if row[0] in devResults:
continue
target = row[1]["target"]
s1 = row[1]["context-1"]
s2 = row[1]["context-2"]
def1 = getDefinitionInContextFewShot(s1, target)
def2 = getDefinitionInContextFewShot(s2, target)
stuftDef1 = getDefinitionInContextFewShotStuft(s1, target, [{"context": s2, "target":target, "sense":def2}])
stuftDef2 = getDefinitionInContextFewShotStuft(s2, target, [{"context": s1, "target":target, "sense":def1}])
results = {"s1":s1, "s2":s2, "def1":def1, "def2":def2, "stuftDef1":stuftDef1, "stuftDef2":stuftDef2, "actual":row[1]["label"], "pos":row[1]["pos"]}
devResults[row[0]] = results
if row[0] % 20 == 0:
print(row[0])
This allows us to pause occasionally or continue if it breaks without having to re-do everything.
We can then convert it to a pandas dataframe to easily manipulate the results.
devDf = pd.DataFrame(devResults).T
devDf.head()
s1 | s2 | def1 | def2 | stuftDef1 | stuftDef2 | actual | pos |
Room and board . | He nailed boards across the windows . | a flat surface | a flat piece of wood | food and lodging | a flat surface | F | N |
Circulate a rumor . | This letter is being circulated among the faculty . | to move | to move | to move | to move | F | V |
Hook a fish . | He hooked a snake accidentally , and was so scared he dropped his rod into the water . | a tool | to catch | to catch | a tool | T | V |
For recreation he wrote poetry and solved crossword puzzles . | Drug abuse is often regarded as a form of recreation . | the act of recreating | a form of activity | a form of activity | the act of recreating | T | N |
Making a hobby of domesticity . | A royal family living in unpretentious domesticity . | the state of being domestic | the quality of being domestic | the quality of being domestic | the state of being domestic | F | N |
Additionally, we can see how it did.
pct_1 = ((devDf["def1"] == devDf["def2"]).apply(lambda x: "T" if x else "F") == devDf["actual"]).mean()
print("accuracy comparing initial answers: {}".format(np.round(100*pct_1)))
pct_2 = ((devDf["def1"] == devDf["stuftDef2"]).apply(lambda x: "T" if x else "F") == devDf["actual"]).mean()
print("accuracy comparing one initial to one stuffed: {}".format(np.round(100*pct_2)))
pct_3 = ((devDf["stuftDef1"] == devDf["stuftDef2"]).apply(lambda x: "T" if x else "F") == devDf["actual"]).mean()
print("accuracy comparing both context stuffed correct: {}".format(np.round(100*pct_3)))
pct_4 = (((devDf["def1"] == devDf["stuftDef2"]) & (devDf["def2"] == devDf["stuftDef1"])).apply(lambda x: "T" if x else "F") == devDf["actual"]).mean()
print("accuracy comparing both of the initials to both the stuffed: {}".format(np.round(100*pct_3)))
method | accuracy |
comparing initial answers | 60.0 |
comparing one initial to one stuffed | 63.0 |
comparing both context stuffed | 55.0 |
comparing both few shot to both context stuffed | 55.0 |
Evaluating Number of Examples
One of the parameters that we haven't looked at here is how many examples should be included (or if they should selectively be included). So, on the dev set (just checking the first 300), we can go ahead and see what the effect is of choosing different numbers of examples. I added a few more from continuing with the head of the train set. Between 1-16 examples, comparing both context stuffed outputs outperformed the others.

So what's going on here? Well, it looks like what's going on is just it's changing the true/false negative rate more than the true/false negative rate, which is close to 100% anyway.
