@@ -42,32 +42,15 @@ We will concentrate on four types of named entities: persons, locations, organiz
\section{Data Description }
The dataset is a corpus of movie reviews originally collected by Pang and Lee [1].This dataset contain tab-separated files with phrases from the Rotten Tomatoes dataset. The data are splitted to \textbf{train/test} sets and the sentences are shuffled from their original order.
~\\
\begin{itemize}
\item Each Sentence has been parsed into many \textbf{phrases} by the Stanford parser.
\item Each phrase has a PhraseId.
\item Each sentence has a \textbf{SentenceId}.
\item Phrases that are repeated (such as short/common words) are only included once in the data.
\end{itemize}
~\\
The training set contain 156000 examples and the test set represent 66300 phrases. In the following table you can find several phrases and their Sentiment score.
\caption{Example of phrases from the training data and their scores}
\end{table}
We will use the CoNLL-2003 shared task data files. These files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase.\\
The data consists of three files : one training file and two test files testa and testb. The first test file will be used in the development phase for finding good parameters for the learning system. The second test file will be used for the final evaluation.
@@ -86,7 +86,7 @@ Ci-dessous quelques points pour simplifier l'organisation et l'échange entre le
\end{enumerate}
Votre travail sera évaluer avec des soutenances de projets après la fin du module Apprentissage automatique en lange pour les alternants et après la fin du module multilinguisme pour les initiaux.\\
Votre travail sera évaluer avec des soutenances de projets après la fin du module Apprentissage automatique en langue pour les alternants et après la fin du module multilinguisme pour les initiaux.\\