IMDEA Networks Institute Publications Repository

Graph-based techniques for tweet classification in Spanish

Cordobés de la Calle, Héctor (2014) Graph-based techniques for tweet classification in Spanish. Masters thesis, Universidad Carlos III de Madrid, Spain.

PDF ( Graph-based techniques for tweet classification in Spanish) - Published Version
Download (585Kb) | Preview


Topic classification of texts is one of the most interesting challenges in Natural Language Processing (NLP). Topic classifiers commonly use a bag of words approach, in which the classifier uses (and is trained with) selected terms from the in put texts. In this work we present techniques based on graph similarity to classify short texts by topic. In our classifier we build graphs from the input texts, and then use properties of these graphs to classify them. We have tested the resulting algorithm by classifying Twitter messages in Spanish among a predefined set of topics, achieving more than 70% accuracy.

Item Type: Theses (Masters)
Uncontrolled Keywords: Topic classification, text classification, graphs, natural language processing.
Depositing User: Rebeca De Miguel
Date Deposited: 13 Mar 2015 17:16
Last Modified: 06 May 2015 11:51

Actions (login required)

View Item View Item