Commit eb4c5bd9 authored by Fethi Bougares's avatar Fethi Bougares
Browse files

SentAna project

parent 0356c97c
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
\catcode `:\active
\catcode `;\active
\catcode `!\active
\catcode `?\active
\@writefile{toc}{\contentsline {section}{\numberline {1}Project Description}{1}}
\@writefile{toc}{\contentsline {section}{\numberline {2}Data Description }{1}}
\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces Example of phrases from the training data and their scores}}{1}}
\@writefile{toc}{\contentsline {section}{\numberline {3}Evaluation}{2}}
\@writefile{toc}{\contentsline {section}{\numberline {4}Project Roadmap}{2}}
This diff is collapsed.
{\bf Movie Review Sentiment Analysis} \\[5mm]
\bf Projet App Auto en langues \\
{\bf 2018/2019} \\[2mm]
\section{Project Description}
The aim of this project is to implement a machine learning model for a sentiment analysis task using the Rotten Tomatoes movie review dataset. During this project you are asked to label phrases on a scale of five values:
\item negative
\item somewhat negative
\item neutral,
\item somewhat positive,
\item positive.
Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this task very challenging.
\section{Data Description }
The dataset is a corpus of movie reviews originally collected by Pang and Lee [1].This dataset contain tab-separated files with phrases from the Rotten Tomatoes dataset. The data are splitted to \textbf{train/test} sets and the sentences are shuffled from their original order.
\item Each Sentence has been parsed into many \textbf{phrases} by the Stanford parser.
\item Each phrase has a PhraseId.
\item Each sentence has a \textbf{SentenceId}.
\item Phrases that are repeated (such as short/common words) are only included once in the data.
The training set contain 156000 examples and the test set represent 66300 phrases. In the following table you can find several phrases and their Sentiment score.
PhraseId & SentenceId & Phrase& Sentiment\\ \hline
8140& 336 &of inept filmmaking &1\\ \hline
8143& 336 &joyless , idiotic , annoying , heavy-handed ,& 0\\ \hline
8146& 336 &joyless& 0 \\ \hline
8147& 336 &idiotic , annoying , heavy-handed& 2\\ \hline
\caption{Example of phrases from the training data and their scores}
Systems are evaluated on classification accuracy (the percent of labels that are predicted correctly) for every parsed phrase.
\section{Project Roadmap}
\item Study and plot the training data
\item Split the data (train/dev)
\item Train and evaluate a vanilla deep recurrent neural network (RNN)
\item Use the Pytorch framework to train the RNN network.
\item Optimize the model and propose enhancement (Regularization, network init, new architecture )
\item Prepare the final defense
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment