Commit 772e3ed8 authored by Fethi Bougares's avatar Fethi Bougares

add corpus NER

parent 8137148d
\relax
\providecommand\hyper@newdestlabel[2]{}
\catcode `:\active
\catcode `;\active
\catcode `!\active
\catcode `?\active
\providecommand\HyperFirstAtBeginDocument{\AtBeginDocument}
\HyperFirstAtBeginDocument{\ifx\hyper@anchor\@undefined
\global\let\oldcontentsline\contentsline
\gdef\contentsline#1#2#3#4{\oldcontentsline{#1}{#2}{#3}}
\global\let\oldnewlabel\newlabel
\gdef\newlabel#1#2{\newlabelxx{#1}#2}
\gdef\newlabelxx#1#2#3#4#5#6{\oldnewlabel{#1}{{#2}{#3}}}
\AtEndDocument{\ifx\hyper@anchor\@undefined
\let\contentsline\oldcontentsline
\let\newlabel\oldnewlabel
\fi}
\fi}
\global\let\hyper@last\relax
\gdef\HyperFirstAtBeginDocument#1{#1}
\providecommand\HyField@AuxAddToFields[1]{}
\providecommand\HyField@AuxAddToCoFields[2]{}
\babel@aux{french}{}
\@writefile{toc}{\contentsline {section}{\numberline {1}Project Description}{1}}
\@writefile{toc}{\contentsline {section}{\numberline {2}Data Description }{1}}
\@writefile{toc}{\contentsline {section}{\numberline {3}Evaluation}{2}}
\@writefile{toc}{\contentsline {section}{\numberline {4}Project Roadmap}{2}}
\@writefile{toc}{\contentsline {section}{\numberline {1}Project Description}{1}{section.1}}
\@writefile{toc}{\contentsline {section}{\numberline {2}Data Description }{1}{section.2}}
\@writefile{toc}{\contentsline {section}{\numberline {3}Evaluation}{2}{section.3}}
\@writefile{toc}{\contentsline {section}{\numberline {4}Project Roadmap}{2}{section.4}}
This diff is collapsed.
No preview for this file type
......@@ -4,6 +4,7 @@
\usepackage{lmodern}
\usepackage[a4paper]{geometry}
\usepackage{babel}
\usepackage{hyperref}
\usepackage{dblfloatfix}
\begin{document}
......@@ -45,12 +46,15 @@ We will concentrate on four types of named entities: persons, locations, organiz
We will use the CoNLL-2003 shared task data files. These files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase.\\
The data consists of three files : one training file and two test files testa and testb. The first test file will be used in the development phase for finding good parameters for the learning system. The second test file will be used for the final evaluation.
The data consists of three files : one training file (about 15k sentences) and two test files testa and testb. The first test file will be used in the development phase for finding good parameters for the learning system. The second test file will be used for the final evaluation.\\
Data are available at :\\
train : \\
testa : \\
testb : \\
\vspace{-1mm}
\hspace{2cm} train : \url{http://perso.univ-lemans.fr/\~ fbouga/eng.train}\\
\vspace{-1mm}
\hspace{2cm} testa : \url{http://perso.univ-lemans.fr/\~ fbouga/eng.testa} \\
\vspace{-1mm}
\hspace{2cm} testb : \url{http://perso.univ-lemans.fr/\~ fbouga/eng.testb} \\
\newpage
\section{Evaluation}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment