...
 
Commits (2)
......@@ -6,6 +6,5 @@
\babel@aux{french}{}
\@writefile{toc}{\contentsline {section}{\numberline {1}Project Description}{1}}
\@writefile{toc}{\contentsline {section}{\numberline {2}Data Description }{1}}
\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces Example of phrases from the training data and their scores}}{1}}
\@writefile{toc}{\contentsline {section}{\numberline {3}Evaluation}{2}}
\@writefile{toc}{\contentsline {section}{\numberline {4}Project Roadmap}{2}}
This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=pdflatex 2018.8.29) 3 OCT 2018 09:57
This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=pdflatex 2018.10.12) 25 OCT 2018 17:46
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
......@@ -526,57 +526,33 @@ LaTeX Info: Redefining \degres on input line 8.
LaTeX Info: Redefining \dots on input line 8.
LaTeX Info: Redefining \up on input line 8.
Underfull \hbox (badness 10000) in paragraph at lines 45--47
Underfull \hbox (badness 10000) in paragraph at lines 46--47
[]
LaTeX Font Info: Try loading font information for OT1+lmr on input line 59.
(/usr/share/texmf/tex/latex/lm/ot1lmr.fd
File: ot1lmr.fd 2009/10/30 v1.6 Font defs for Latin Modern
)
LaTeX Font Info: Try loading font information for OML+lmm on input line 59.
(/usr/share/texmf/tex/latex/lm/omllmm.fd
File: omllmm.fd 2009/10/30 v1.6 Font defs for Latin Modern
)
LaTeX Font Info: Try loading font information for OMS+lmsy on input line 59.
Underfull \hbox (badness 10000) in paragraph at lines 50--54
(/usr/share/texmf/tex/latex/lm/omslmsy.fd
File: omslmsy.fd 2009/10/30 v1.6 Font defs for Latin Modern
)
LaTeX Font Info: Try loading font information for OMX+lmex on input line 59.
[]
(/usr/share/texmf/tex/latex/lm/omxlmex.fd
File: omxlmex.fd 2009/10/30 v1.6 Font defs for Latin Modern
)
LaTeX Font Info: External font `lmex10' loaded for size
(Font) <10> on input line 59.
LaTeX Font Info: External font `lmex10' loaded for size
(Font) <7> on input line 59.
LaTeX Font Info: External font `lmex10' loaded for size
(Font) <5> on input line 59.
[1
[1
{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}] [2] (./sujet1_NER.aux) )
Here is how much of TeX's memory you used:
2625 strings out of 492982
35035 string characters out of 6134896
109467 words of memory out of 5000000
6174 multiletter control sequences out of 15000+600000
36203 words of font info for 32 fonts, out of 8000000 for 9000
2549 strings out of 492982
33427 string characters out of 6134895
107465 words of memory out of 5000000
6122 multiletter control sequences out of 15000+600000
19403 words of font info for 20 fonts, out of 8000000 for 9000
1141 hyphenation exceptions out of 8191
29i,7n,30p,309b,423s stack positions out of 5000i,500n,10000p,200000b,80000s
29i,4n,30p,696b,423s stack positions out of 5000i,500n,10000p,200000b,80000s
{/
usr/share/texmf/fonts/enc/dvips/lm/lm-ec.enc}</usr/share/texmf/fonts/type1/publ
ic/lm/lmbx10.pfb></usr/share/texmf/fonts/type1/public/lm/lmbx12.pfb></usr/share
/texmf/fonts/type1/public/lm/lmcsc10.pfb></usr/share/texmf/fonts/type1/public/l
m/lmr10.pfb>
Output written on sujet1_NER.pdf (2 pages, 104987 bytes).
ic/lm/lmbx12.pfb></usr/share/texmf/fonts/type1/public/lm/lmr10.pfb>
Output written on sujet1_NER.pdf (2 pages, 61304 bytes).
PDF statistics:
28 PDF objects out of 1000 (max. 8388607)
19 compressed objects within 1 object stream
20 PDF objects out of 1000 (max. 8388607)
13 compressed objects within 1 object stream
0 named destinations out of 1000 (max. 500000)
1 words of extra memory for PDF output out of 10000 (max. 10000000)
No preview for this file type
......@@ -42,32 +42,15 @@ We will concentrate on four types of named entities: persons, locations, organiz
\section{Data Description }
The dataset is a corpus of movie reviews originally collected by Pang and Lee [1].This dataset contain tab-separated files with phrases from the Rotten Tomatoes dataset. The data are splitted to \textbf{train/test} sets and the sentences are shuffled from their original order.
~\\
\begin{itemize}
\item Each Sentence has been parsed into many \textbf{phrases} by the Stanford parser.
\item Each phrase has a PhraseId.
\item Each sentence has a \textbf{SentenceId}.
\item Phrases that are repeated (such as short/common words) are only included once in the data.
\end{itemize}
~\\
The training set contain 156000 examples and the test set represent 66300 phrases. In the following table you can find several phrases and their Sentiment score.
\begin{table}[!b]
\begin{tabular}{|c|c|l|c|}
\hline
PhraseId & SentenceId & Phrase& Sentiment\\ \hline
8140& 336 &of inept filmmaking &1\\ \hline
8143& 336 &joyless , idiotic , annoying , heavy-handed ,& 0\\ \hline
8146& 336 &joyless& 0 \\ \hline
8147& 336 &idiotic , annoying , heavy-handed& 2\\ \hline
\end{tabular}
\caption{Example of phrases from the training data and their scores}
\end{table}
We will use the CoNLL-2003 shared task data files. These files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase.\\
The data consists of three files : one training file and two test files testa and testb. The first test file will be used in the development phase for finding good parameters for the learning system. The second test file will be used for the final evaluation.
Data are available at :\\
train : \\
testa : \\
testb : \\
\newpage
\section{Evaluation}
......
......@@ -62,6 +62,8 @@ What is an annotated bibliography ? & DESCRIPTION \\ \hline
\end{table}
Data are available at :
\newpage
\section{Evaluation}
......
No preview for this file type
......@@ -86,7 +86,7 @@ Ci-dessous quelques points pour simplifier l'organisation et l'échange entre le
\end{enumerate}
Votre travail sera évaluer avec des soutenances de projets après la fin du module Apprentissage automatique en lange pour les alternants et après la fin du module multilinguisme pour les initiaux.\\
Votre travail sera évaluer avec des soutenances de projets après la fin du module Apprentissage automatique en langue pour les alternants et après la fin du module multilinguisme pour les initiaux.\\
......