Commit 0356c97c authored by Fethi Bougares's avatar Fethi Bougares
Browse files

TREC project

parent bcf985de
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
\catcode `:\active
\catcode `;\active
\catcode `!\active
\catcode `?\active
\@writefile{toc}{\contentsline {section}{\numberline {1}Project Description}{1}}
\@writefile{toc}{\contentsline {section}{\numberline {2}Data Description }{1}}
\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces Example of sentences with their corresponding classes from the training set}}{1}}
\@writefile{toc}{\contentsline {section}{\numberline {3}Evaluation}{2}}
\@writefile{toc}{\contentsline {section}{\numberline {4}Project Roadmap}{2}}
\@writefile{toc}{\contentsline {section}{\numberline {5}Reference}{2}}
This diff is collapsed.
{\bf Question Classification} \\[5mm]
\bf Projet App Auto en langues \\
{\bf 2018/2019} \\[2mm]
\section{Project Description}
The task of question classification (QC) is to predict the entity type of a question which is written in natural language.
In this project we would like to implement a machine learning model for a Question Classification task. The model will be trained and evaluated using the TREC dataset. TREC dataset is a collection of an annotated questions into 6 classes:
\item ABBREVIATION ~~~~~ $\rightarrow$ abbreviation
\item ENTITY ~~~~~~~~~~~~~~~~~ $\rightarrow$ entities
\item DESCRIPTION ~~~~~~~ $\rightarrow$ description and abstract concepts
\item HUMAN ~~~~~~~~~~~~~~~~~ $\rightarrow$ human beings
\item LOCATION ~~~~~~~~~~~~ $\rightarrow$ locations
\item NUMERIC ~~~~~~~~~~~~~~ $\rightarrow$ numeric values
\section{Data Description }
TREC dataset is a collection of 6000 labeled questions. It consists of two separate set of 5500 and 500
questions in which the first is used as training set and the second is used as an independent test set.
This dataset was first published in University of Illinois Urbana-Champaign (uiuc) usually referred as the UIUC dataset and sometimes referred as the TREC dataset since it is widely use in the Text REtrieval Conference (TREC).
Sentence & Class \\ \hline
What team did baseball 's St. Louis Browns become ? & HUMAN \\ \hline
What are liver enzymes ? & DESCRIPTION \\ \hline
When was Ozzy Osbourne born ? & NUMERIC \\ \hline
Who was The Pride of the Yankees ?& HUMAN \\ \hline
What sprawling U.S. state boasts the most airports ? & LOCATION \\ \hline
What is an annotated bibliography ? & DESCRIPTION \\ \hline
\caption{Example of sentences with their corresponding classes from the training set}
Systems are evaluated on classification accuracy (the percent of labels that are predicted correctly) for every parsed phrase. We Would like to have also the precision/recall scores for each class.
\section{Project Roadmap}
\item Preprocess and prepare the training data
\item Train and evaluate a vanilla deep recurrent neural network (RNN)
\item Use the Pytorch framework to train the RNN network.
\item Optimize the model and propose enhancement (Regularization, network init, new architecture )
\item Prepare the final defense
\item Present your model and the obtained results
Xin Li, Dan Roth, Learning Question Classifiers. COLING'02, Aug., 2002.
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment