ANLI

parent:
linguistics

Related:


Overview

Adversarial Natural Language Inference (ANLI) is a benchmark measure introduced by a team at Facebook. It evaluates how well a model can evaluate a context entails or contradicts a hypothesis. 3 things, etc etc.

Approaches

In the GPT paper, they got to 40% on R3 Dev.

Multi-pass approach

The hardest part seems to be finding a prompt that can separate the neutral (non-contradictory or entailment) statements out.

Step 1: Entail v. Contradict (42.25% R3 Dev)

https://gist.github.com/brockmanmatt/78f498ad04b1d2afaecc9ed05d4878f8

JSON of step 1 results

Step 2: Entail vs. Neutral NCE (43.5% R3 Dev)

https://gist.github.com/brockmanmatt/4e499f64fed585eff05219a1343c7886

Step 2: Entail/Contradict vs. Neutral (40%-43% Dev)

https://gist.github.com/brockmanmatt/3e00028633e05c4ef7afbc19246d045c

Trying to replace both contract and entailment drops down to 40%. Keeping the ones replacement of previously labeled contradict does raise it to 43%.

Neutral v. Not:

Haven't run this yet to see if this in conjunction with previous step improves overall score.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License