sujet2_SICK.tex 2.94 KB
Newer Older
Loïc Barrault's avatar
Loïc Barrault committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
\documentclass[french]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{lmodern}
\usepackage[a4paper]{geometry}
\usepackage{babel}
\usepackage{url}
\usepackage{dblfloatfix}
\usepackage{booktabs}

\begin{document}

\begin{center}
	\LARGE
	{\bf Natural Language Inference} \\[5mm]
	\Large
17
	\bf Machine Learning for Language Processing Project \\
Loïc Barrault's avatar
Loïc Barrault committed
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
	{\bf 2018/2019} \\[2mm]
\end{center}

\vspace{1cm}

\section{Project Description}

This project actually contains 2 tasks: sentence entailment (classification task) and sentence relatedness (regression task)

The task of sentence entailment (SICK-E) is to predict whether two sentences are \textbf{entailed}, \textbf{neutral} or \textbf{contradictory}. 
The task of sentence relatedness (SICK-R) is to predict the \textbf{relatedness score} between two sentences. This score ranges from 0.0 to 5.0.

The goal of this project is to implement a machine learning model for SICK-E and/or SICK-R. 


\section{Data Description }

Details for this dataset are available at the following address: \textrm{http://clic.cimec.unitn.it/composes/sick.html}

File format (tab separated):

{ \scriptsize
\begin{center}
\begin{tabular}{lllcc}
pair\_ID & sentence\_A &      sentence\_B &      relatedness\_score &       entailment\_judgment \\
93 &      A lone biker is jumping in the air &      A man is jumping into a full pool &       1.7     & NEUTRAL
\end{tabular}
\end{center}
}

The provided files are described in Table~\ref{table:data}.
\begin{table}[htbp]
\begin{center}
\begin{tabular}{|c|l|c|}
\toprule
Name & File & \# Sent. pairs \\
\midrule
Train & SICK\_train.txt & 4501 \\
Dev & SICK\_trial.txt & 501 \\
Test & SICK\_test.txt & 4928 \\
\bottomrule
\end{tabular}
\end{center}
\caption{\label{table:data}Description of the data}
\end{table}


\section{Evaluation}

For SICK-E, systems are evaluated on classification accuracy (the percent of labels that are predicted correctly) for every sentence pairs. 
We are also interested in the precision/recall scores for each class as well as a confusion matrix. 

For SICK-R, systems are evaluated using the Pearson correlation coefficient: see scipy.stats.pearsonr. 

\section{Project Roadmap}
\begin{enumerate}
\item Preprocess and prepare the training data
\item Train, optimize and evaluate a baseline deep recurrent neural network (RNN) using pytorch. \textbf{[one per group]}
\item Each student should propose \textbf{one} enhancement to the baseline model (additional data, regularization, network initialization, new architecture, etc.) \textbf{[one per student]}
\item Prepare the final defense: present your model and the obtained results 
\end{enumerate}


\section{References}
\begin{itemize}
\item SICK webpage: \textrm{http://clic.cimec.unitn.it/composes/sick.html}
\item Conneau and Kiela, 2018 \textbf{SentEval: An Evaluation Toolkit for Universal Sentence Representations}  
	\begin{itemize}
		\item \textrm{https://arxiv.org/abs/1803.05449}
	\end{itemize}
\end{itemize}
89
\end{document}