sujet2_SICK.tex 2.94 KB
 Loïc Barrault committed Oct 05, 2018 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 \documentclass[french]{article} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{lmodern} \usepackage[a4paper]{geometry} \usepackage{babel} \usepackage{url} \usepackage{dblfloatfix} \usepackage{booktabs} \begin{document} \begin{center} \LARGE {\bf Natural Language Inference} \\[5mm] \Large  Loïc Barrault committed Nov 01, 2018 17  \bf Machine Learning for Language Processing Project \\  Loïc Barrault committed Oct 05, 2018 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88  {\bf 2018/2019} \\[2mm] \end{center} \vspace{1cm} \section{Project Description} This project actually contains 2 tasks: sentence entailment (classification task) and sentence relatedness (regression task) The task of sentence entailment (SICK-E) is to predict whether two sentences are \textbf{entailed}, \textbf{neutral} or \textbf{contradictory}. The task of sentence relatedness (SICK-R) is to predict the \textbf{relatedness score} between two sentences. This score ranges from 0.0 to 5.0. The goal of this project is to implement a machine learning model for SICK-E and/or SICK-R. \section{Data Description } Details for this dataset are available at the following address: \textrm{http://clic.cimec.unitn.it/composes/sick.html} File format (tab separated): { \scriptsize \begin{center} \begin{tabular}{lllcc} pair\_ID & sentence\_A & sentence\_B & relatedness\_score & entailment\_judgment \\ 93 & A lone biker is jumping in the air & A man is jumping into a full pool & 1.7 & NEUTRAL \end{tabular} \end{center} } The provided files are described in Table~\ref{table:data}. \begin{table}[htbp] \begin{center} \begin{tabular}{|c|l|c|} \toprule Name & File & \# Sent. pairs \\ \midrule Train & SICK\_train.txt & 4501 \\ Dev & SICK\_trial.txt & 501 \\ Test & SICK\_test.txt & 4928 \\ \bottomrule \end{tabular} \end{center} \caption{\label{table:data}Description of the data} \end{table} \section{Evaluation} For SICK-E, systems are evaluated on classification accuracy (the percent of labels that are predicted correctly) for every sentence pairs. We are also interested in the precision/recall scores for each class as well as a confusion matrix. For SICK-R, systems are evaluated using the Pearson correlation coefficient: see scipy.stats.pearsonr. \section{Project Roadmap} \begin{enumerate} \item Preprocess and prepare the training data \item Train, optimize and evaluate a baseline deep recurrent neural network (RNN) using pytorch. \textbf{[one per group]} \item Each student should propose \textbf{one} enhancement to the baseline model (additional data, regularization, network initialization, new architecture, etc.) \textbf{[one per student]} \item Prepare the final defense: present your model and the obtained results \end{enumerate} \section{References} \begin{itemize} \item SICK webpage: \textrm{http://clic.cimec.unitn.it/composes/sick.html} \item Conneau and Kiela, 2018 \textbf{SentEval: An Evaluation Toolkit for Universal Sentence Representations} \begin{itemize} \item \textrm{https://arxiv.org/abs/1803.05449} \end{itemize} \end{itemize}  Loïc Barrault committed Nov 01, 2018 89 \end{document}