parent:
linguistics
Related:
Overview
Adversarial Natural Language Inference (ANLI) is a benchmark measure introduced by a team at Facebook. It evaluates how well a model can evaluate a context entails or contradicts a hypothesis. 3 things, etc etc.
Approaches
In the GPT paper, they got to 40% on R3 Dev.
Multi-pass approach
The hardest part seems to be finding a prompt that can separate the neutral (non-contradictory or entailment) statements out.
Step 1: Entail v. Contradict (42.25% R3 Dev)
https://gist.github.com/brockmanmatt/78f498ad04b1d2afaecc9ed05d4878f8
Step 2: Entail vs. Neutral NCE (43.5% R3 Dev)
https://gist.github.com/brockmanmatt/4e499f64fed585eff05219a1343c7886
Step 2: Entail/Contradict vs. Neutral (40%-43% Dev)
https://gist.github.com/brockmanmatt/3e00028633e05c4ef7afbc19246d045c
Trying to replace both contract and entailment drops down to 40%. Keeping the ones replacement of previously labeled contradict does raise it to 43%.
Neutral v. Not:
Haven't run this yet to see if this in conjunction with previous step improves overall score.