ANES Survey

parent:
implicit-bias

Related:


Overview

The American National Election Survey is a survey that's been conducted since 1948. It's a survey that's conducted every two years, and it's a survey that's conducted in the month before the election. The purpose of the ANES is to provide a snapshot of the American electorate. Because same or similar questions are asked year after year, it can measure shifts in opinion on social issues. (according to gpt anyway)

The ANES homepage can be found at https://electionstudies.org/, the full list of survey questions asked over the last 80 years can be found at https://electionstudies.org/wp-content/uploads/2018/12/anes_timeseries_cdf_codebook_var.pdf. As sample of those questions are included below.

Unfortunately, I went with an example here that has 9 choices and it turns out this is actually a hard problem! (I haven't been using MIP for anything, it was a good generic example). I guess I'll have a section on the pain that is MIP.

Setup

To begin with, we'll set up some python code to make asking a question easy and save prompts and responses. Here we define a function "query" that just wraps around the API to extract the text response as well as save the queries.

import openai, datetime, json, pandas as pd

#arguments to send the API
kwargs = { "engine":"davinci", "temperature":0, "max_tokens":150, "stop":"\n\n"}

def query(prompt, myKwargs = kwargs):
  """
  wrapper for the API to save the prompt and the result
  """

  r = openai.Completion.create(prompt=prompt, **myKwargs)["choices"][0]["text"].strip()
  with open("{}.json".format(datetime.datetime.now().strftime("%Y%m%d%s")), "w") as fh:
    json.dump({"prompt":prompt, "response":r}, fh, indent=4)
  return r

prompts = {}

Examples

These examples go through the process of figuring out prompts that work for identifying ways to automate answering the ANES survey questions for prompts. It isn't easy, and a lot of double checking helps to make sure that the answers aren't deviating unexpectedly. Work by the community in establishing benchmarks for accurate transformations of answers would probably be helpful.

Most Important Problem

Getting the Most Important Problem

We can set up the most important problem (MIP) prompt easy enough (VCF0875, p. 413 of the ANES questions); we'll have the primer at the start followed by a q/a.

prompts["MIP"] = """{}
q: What do you think are the most important problems facing this country?
a:"""

We can then prime it and get the result.

newKwargs = kwargs.copy()
newKwargs["stop"] = "\n"
newKwargs["temperature"] = 0

prefix = "Crime is on the rise"
query(prompts["MIP"].format(prefix), myKwargs = newKwargs)
output: 'Crime, the economy, and immigration'

So we can try other primes, including telling it health care's important

prefix = "The hospitals are totally not prepared for the pandemic. With unemployment rising, a lot of people are losing their health care."
query(prompts["MIP"].format(prefix), myKwargs = newKwargs)
output:'The most important problems facing this country are the economy, the war, and the environment.'

prefix = "Health care is a huge issue in this country."
query(prompts["MIP"].format(prefix), myKwargs = newKwargs)
output 'The most important problems facing this country are the economy and health care.'

The ANES questionnaire says we're supposed to followup with what the most important perception actually is. We can use GPT for this as well.

At this point, we can break the issues down to figure out what they are

prompts["getEachMIP"]="""Each text has multiple issues associated with it. They're listed below.
text: The most important issues are cats and dogs
problems:
cats
dogs

text: The important problems are food, sleep, and taxes
problems:
food
sleep
taxes

text: {}
problems:
"""

response = query(prompts["MIP"].format(prefix), myKwargs = newKwargs)
issues = query(prompts["getEachMIP"].format(response))
print(issues)
output:
economy
health care

From here, we can simply go and clarify which of the issues that was given back is the most important using the ANES clarification question, "(IF MORE THAN ONE PROBLEM:) Of all you've told me (1996-LATER: Of those you've mentioned), what would you say is the single most important problem the country faces?"

So what we'll do is stick the answers back into the original and see what it sticks out.

prompts["clarifyMIP"] = """q: Of those you've mentioned, what would you say is the single most important problem the country faces?
a:"""

if len([x for x in issues.split("\n")]) > 1:
  clarificationPrompt = (prompts["MIP"].format(prefix)) + response
  clarificationPrompt += "\n\n" + prompts["clarifyMIP"]
  clarification = query(clarificationPrompt, newKwargs)
  print(clarification)

output: 'The single most important problem facing this country is the economy.'

Now, running the issue extraction again we should get a single answer

issues = query(prompts["getEachMIP"].format(clarification))
print(issues)
output: economy

Great! So far the code works; the problem is mapping that to the 12 or so allowed values.

The hard part: mapping the answers to the scale

So now we need to go and map this to the key that's provided for the types of answers we can get. They'e as follows:

prompts["mipKey"] = """The Following numbers are associatd with issues:
01. AGRICULTURAL
02. ECONOMICS such as BUSINESS; CONSUMER ISSUES
03. FOREIGN AFFAIRS AND NATIONAL DEFENSE
04. GOVERNMENT FUNCTIONING
05. LABOR ISSUES
06. NATURAL RESOURCES
07. PUBLIC ORDER
08. RACIAL PROBLEMS
09. SOCIAL WELFARE
97. Other problems (incl. specific campaign issues)
98. DON'T KNOW
00. NONE

q: which issue is "war with vietname" most associated with?
a: 03. FOREIGN AFFAIRS AND NATIONAL DEFENSE

q: which issue is "unemployment" most associated with?
a: 09. SOCIAL WELFARE

q: which issue is "racial inquality" most associated with?
a: 08. RACIAL PROBLEMS

q: which issue is "{}" associated most associated with?
a:"""

issueCode = query(prompts["mipKey"].format(issues), myKwargs=newKwargs)
print(issueCode)
output: '02. ECONOMICS such as BUSINESS; CONSUMER ISSUES'

However, it turns out the prompt that's used to ID the Most Important Problem number is actually bad. We can test multiple issues and it's obvious that step needs improvement:

prefixes = ["Crime is on the rise.","Health care is a huge issue in this country.", "I think that we need to make sure to fund the military way more.", "I'm really concerned that farms aren't producing enough food.", "The government spending is through the roof, they'll need to raise taxes if they keep funding all these initiatives."]
for prefix in prefixes:
  print("Prefix: {}".format(prefix))
  response = query(prompts["MIP"].format(prefix), myKwargs = newKwargs)
  issues = query(prompts["getEachMIP"].format(response))

  if len([x for x in issues.split("\n")]) > 1:
    clarificationPrompt = (prompts["MIP"].format(prefix)) + response
    clarificationPrompt += "\n\n" + prompts["clarifyMIP"]
    clarification = query(clarificationPrompt, newKwargs)
    issues = query(prompts["getEachMIP"].format(clarification))

  if len(issues.split("\n"))>1: #check if still more than 1
    issues = issues.split("\n")[0]

  issueCode = query(prompts["mipKey"].format(issues), myKwargs=newKwargs)

  print("MIP: {}".format(issues))
  print("code: {}".format(issueCode))

output:
Prefix: Crime is on the rise.
MIP: text: The most important problems are crime and taxes
code: 01. AGRICULTURAL
Prefix: Health care is a huge issue in this country.
MIP: economy
code: 02. ECONOMICS such as BUSINESS; CONSUMER ISSUES
Prefix: I think that we need to make sure to fund the military way more.
MIP: poor people
code: 09. SOCIAL WELFARE
Prefix: I'm really concerned that farms aren't producing enough food.
MIP: poor
code: 09. SOCIAL WELFARE
Prefix: The government spending is through the roof, they'll need to raise taxes if they keep funding all these initiatives.
MIP: deficit
code: 01. AGRICULTURAL

So while this code succeeds in moving things around, this is not usable so far. At this point, we can then create a definition for the pipeline that gets the ANES code for a given prime and work to improve each of the sub-parts.

def getMIPCode(context, verbose=False):
  newKwargs = kwargs.copy()
  newKwargs["stop"] = "\n"

  if verbose:
    print("Prefix: {}".format(prefix))
  response = query(prompts["MIP"].format(prefix), myKwargs = newKwargs)
  issues = query(prompts["getEachMIP"].format(response))

  if len([x for x in issues.split("\n")]) > 1:
    clarificationPrompt = (prompts["MIP"].format(prefix)) + response
    clarificationPrompt += "\n\n" + prompts["clarifyMIP"]
    clarification = query(clarificationPrompt, newKwargs)
    issues = query(prompts["getEachMIP"].format(clarification))

  if len(issues.split("\n"))>1: #check if still more than 1
    issues = issues.split("\n")[0]

  issueCode = query(prompts["mipKey"].format(issues), myKwargs=newKwargs)

  if verbose:
    print("MIP: {}".format(issues))
    print("code: {}".format(issueCode))
  return issueCode

Improving ANES Code Matching

Iterative Pairwise Ranking

We can try just pairwise testing each code against the next; e.g. is the output issue closer to Agricultural or Economics? Is it closest to the most similar of those and Foreign Affairs? And continuing like that.

def test_most_similar(issue):
  definitions = ANES_MIP_tester.definitions
  keys = list(definitions.keys())

  prompt = """q: is 'a cat' more related to "A DOG" or "A TOASTER"?
a: A DOG

q: is 'space' more related to "PHYSICS" or "DON'T KNOW"?
a: PHYSICS

q: is 'painting' more related to "MATH" or "THE ARTS"?
a: THE ARTS

q: is '{}' more related to '{}' or '{}'
a:"""

  initialReponse = query(prompt.format(issue, keys[0], keys[1]))
  print("first response: {}".format(initialReponse))
  for i in range(2, len(keys)):
      initialReponse = query(prompt.format(issue, initialReponse, keys[i]))
      print("{} response: {}".format(i, initialReponse))

  print(initialReponse)

However, this doesn't work too well and it usually ends up with "None". I'm not sure why that is but I didn't try too hard because the next answer works better.

Search

Search actually performs pretty well.

We can define a definition for each of the issues:

  definitions = {
    "AGRICULTURAL":"issues dealing with farming",
    "ECONOMICS such as BUSINESS; CONSUMER ISSUES":"issues dealing with the economy",
    "FOREIGN AFFAIRS AND NATIONAL DEFENSE":"issues dealing with foreign affairs and national defense",
    "GOVERNMENT FUNCTIONING":"issues dealing wih government services",
    "LABOR ISSUES":"issues dealing with workers and employees",
    "NATURAL RESOURCES":"issues dealing with natural resources",
    "PUBLIC ORDER":"non-racial issues dealing with crime, civil liberties, and other rights",
    "RACIAL PROBLEMS":"issues dealing with racial equality and civil rights",
    "SOCIAL WELFARE":"issues dealing with unemployment, elderly care, aid to education, and other welfare concerns",
    "DON'T KNOW":"not knowing",
    "NONE OF THE ABOVE":"something else"
  }
[[code]]

Next, we can take a bunch of issues and compare them to the definitions.

[[code]]
keys = list(definitions.keys())
test_responses = ["world hunter", "nuclear winter", "war", "taxes", "the environment", "the economy", "civil rights", "social security"]

dfs = {}

for example in test_responses:
  dfs[example] = pd.DataFrame()
  scores = openai.Engine("davinci").search(documents=[x for x in keys],query=example)["data"]
  for i in range(len(scores)):
    dfs[example].at[keys[i], "key"] = scores[i]["score"]

  scores = openai.Engine("davinci").search(documents=[definitions[x] for x in keys],query=example)["data"]
  for i in range(len(scores)):
    dfs[example].at[keys[i], "defn"] = scores[i]["score"]

This gives us a dataframe for each where we can see both the highest similarity term and definition tend to match, but definitions are better. Of course, we really need a bunch of labeled examples to run a F1 score against. (Also, there's a notebook that does a train/test split on labelled data to optimize the F1 score on this sort of task, https://github.com/brockmanmatt/OpenAISurveyWrapper/blob/master/02_evaluateFromSearch.ipynb)

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License