2本目！　REASONING ABOUT ENTAILMENT WITH NEURAL ATTENTION

前回のgpt-2の引用の論文であるImproving Language Understanding by Generative Pre-Trainingで使われていたこの論文。

まずは、どうこの論文がImproving Language Understanding by Generative Pre-Trainingで使われていたのかを復習すると..。

通常

input ->Encoder -> output h_n from Encoder ->input h_n to Decoder & input token to Decoder -> output predicted token from Decoder

gpt-2

input -> Decoder -> output predicted token from Decoder

つまり、Encoderを介さずに、Decoderに直接ぶち込んで、出力文を予測していました。この時の inputは　 input sentence label data などの tokenで分割したデータの入力となっています。

今回の論文により、Encoderでh_nを出力して、Decoderに入力しなくても、token分割により学習できることが示されています。

では、詳しく見ていきましょう！

Introduction

今回のTaskについて。今回の論文におけるタスクはRTEが用いられています。

RTEは初めて聞いたので、調べてみると、前提の文章から導かれる、仮定の分が1.対比、2.無関係、3.因果関係のどれかを判定するというTaskでした。

githubのコードで調べてみると。

def load_dataset(dataset_dir):
    print("Loading SNLI dataset")
    dataset = {}
    for split in ['train', 'dev', 'test']:
        split_path = os.path.join(dataset_dir, 'snli_1.0_{}.txt'.format(split))
        df = pd.read_csv(split_path, delimiter='\t')
        dataset[split] = {
            "premises": df[["sentence1"]].values,
            "hypothesis": df[["sentence2"]].values,
            "targets": df[["gold_label"]].values
        }

    return dataset

っていう感じで、データを集めているみたい。

で、この SNLI dataset について調べてみると、こんな感じ。 SNLI = The Stanford Natural Language Inference Corpusの略称。

中身を取り出してみるとこう。

gold_label   sentence1_binary_parse  sentence2_binary_parse  sentence1_parse sentence2_parse sentence1   sentence2   captionID   pairID  label1  label2  label3  label4  label5
neutral ( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) ) ( ( A person ) ( ( is ( ( training ( his horse ) ) ( for ( a competition ) ) ) ) . ) )  (ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .)))   (ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (VP (VBG training) (NP (PRP$ his) (NN horse)) (PP (IN for) (NP (DT a) (NN competition))))) (. .)))    A person on a horse jumps over a broken down airplane.  A person is training his horse for a competition.   3416050480.jpg#4    3416050480.jpg#4r1n neutral             
contradiction   ( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) ) ( ( A person ) ( ( ( ( is ( at ( a diner ) ) ) , ) ( ordering ( an omelette ) ) ) . ) ) (ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .)))   (ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (PP (IN at) (NP (DT a) (NN diner))) (, ,) (S (VP (VBG ordering) (NP (DT an) (NN omelette))))) (. .))) A person on a horse jumps over a broken down airplane.  A person is at a diner, ordering an omelette.   3416050480.jpg#4    3416050480.jpg#4r1c contradiction               
entailment  ( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) ) ( ( A person ) ( ( ( ( is outdoors ) , ) ( on ( a horse ) ) ) . ) ) (ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .)))   (ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (ADVP (RB outdoors)) (, ,) (PP (IN on) (NP (DT a) (NN horse)))) (. .)))   A person on a horse jumps over a broken down airplane.  A person is outdoors, on a horse.   3416050480.jpg#4    3416050480.jpg#4r1e entailment              
neutral ( Children ( ( ( smiling and ) waving ) ( at camera ) ) )   ( They ( are ( smiling ( at ( their parents ) ) ) ) )   (ROOT (NP (S (NP (NNP Children)) (VP (VBG smiling) (CC and) (VBG waving) (PP (IN at) (NP (NN camera)))))))  (ROOT (S (NP (PRP They)) (VP (VBP are) (VP (VBG smiling) (PP (IN at) (NP (PRP$ their) (NNS parents)))))))   Children smiling and waving at camera   They are smiling at their parents   2267923837.jpg#2    2267923837.jpg#2r1n neutral             
entailment  ( Children ( ( ( smiling and ) waving ) ( at camera ) ) )   ( There ( ( are children ) present ) )  (ROOT (NP (S (NP (NNP Children)) (VP (VBG smiling) (CC and) (VBG waving) (PP (IN at) (NP (NN camera)))))))  (ROOT (S (NP (EX There)) (VP (VBP are) (NP (NNS children)) (ADVP (RB present)))))   Children smiling and waving at camera   There are children present  2267923837.jpg#2    2267923837.jpg#2r1e entailment

スッゲー見づらいけど、実際に使われているのはgold_label, sentence1, sentence2の3つだけ。いたって普通。

っていうか、「馬に乗った男が墜落した飛行機を飛び越える」とかファンタジーすぎる（笑）。

Methods

LSTM with Attetionのモデルを採用しています。

LSTMはググればいっぱい出てくるので省略。

今回の論文の肝は

1.Conditional Encoding

2.Attencion

3.Word by Word Attention

の3つらしい。

f:id:kaya-takashiro:20210825204009j:plain — model

labelの出力 $o _ N$ を計算するのには、 $o _ N = \sigma (W^ o H + b^ o)$ という計算式が必要で $H = [x _ t, h _ {N-1}$ ]という式からHが導かれる。つまり、Attentionをかけるのは $h_{N-1}$ だけに掛ければいいみたい。