...
 
Commits (2)
*.aux
*.log
*.out
*.synctex.gz
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
\documentclass[french]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{lmodern}
\usepackage[a4paper]{geometry}
\usepackage{babel}
\usepackage{url}
\usepackage{dblfloatfix}
\usepackage{booktabs}
\begin{document}
\begin{center}
\LARGE
{\bf Natural Language Inference} \\[5mm]
\Large
\bf Projet Apprentissage Automatique en Langues \\
{\bf 2018/2019} \\[2mm]
\end{center}
\vspace{1cm}
\section{Project Description}
This project actually contains 2 tasks: sentence entailment (classification task) and sentence relatedness (regression task)
The task of sentence entailment (SICK-E) is to predict whether two sentences are \textbf{entailed}, \textbf{neutral} or \textbf{contradictory}.
The task of sentence relatedness (SICK-R) is to predict the \textbf{relatedness score} between two sentences. This score ranges from 0.0 to 5.0.
The goal of this project is to implement a machine learning model for SICK-E and/or SICK-R.
\section{Data Description }
Details for this dataset are available at the following address: \textrm{http://clic.cimec.unitn.it/composes/sick.html}
File format (tab separated):
{ \scriptsize
\begin{center}
\begin{tabular}{lllcc}
pair\_ID & sentence\_A & sentence\_B & relatedness\_score & entailment\_judgment \\
93 & A lone biker is jumping in the air & A man is jumping into a full pool & 1.7 & NEUTRAL
\end{tabular}
\end{center}
}
The provided files are described in Table~\ref{table:data}.
\begin{table}[htbp]
\begin{center}
\begin{tabular}{|c|l|c|}
\toprule
Name & File & \# Sent. pairs \\
\midrule
Train & SICK\_train.txt & 4501 \\
Dev & SICK\_trial.txt & 501 \\
Test & SICK\_test.txt & 4928 \\
\bottomrule
\end{tabular}
\end{center}
\caption{\label{table:data}Description of the data}
\end{table}
\section{Evaluation}
For SICK-E, systems are evaluated on classification accuracy (the percent of labels that are predicted correctly) for every sentence pairs.
We are also interested in the precision/recall scores for each class as well as a confusion matrix.
For SICK-R, systems are evaluated using the Pearson correlation coefficient: see scipy.stats.pearsonr.
\section{Project Roadmap}
\begin{enumerate}
\item Preprocess and prepare the training data
\item Train, optimize and evaluate a baseline deep recurrent neural network (RNN) using pytorch. \textbf{[one per group]}
\item Each student should propose \textbf{one} enhancement to the baseline model (additional data, regularization, network initialization, new architecture, etc.) \textbf{[one per student]}
\item Prepare the final defense: present your model and the obtained results
\end{enumerate}
\section{References}
\begin{itemize}
\item SICK webpage: \textrm{http://clic.cimec.unitn.it/composes/sick.html}
\item Conneau and Kiela, 2018 \textbf{SentEval: An Evaluation Toolkit for Universal Sentence Representations}
\begin{itemize}
\item \textrm{https://arxiv.org/abs/1803.05449}
\end{itemize}
\end{itemize}
\end{document}
\ No newline at end of file