LDA-based Topic Modelling in Text Sentiment Classification: An Empirical Analysis


Onan A. , Korukoğlu S. , Bulut H.

International Journal of Computational Linguistics and Applications, no.7, ss.101-119, 2016 (Diğer Kurumların Hakemli Dergileri)

  • Basım Tarihi: 2016
  • Dergi Adı: International Journal of Computational Linguistics and Applications
  • Sayfa Sayıları: ss.101-119

Özet

Sentiment analysis is the process of identifying the subjective information in the source materials towards an entity. It is a subfield of text and web mining. Web is a rich and progressively expanding source of information. Sentiment analysis can be modelled as a text classification problem. Text classification suffers from the high dimensional feature space and feature sparsity problems. The use of conventional representation schemes to represent text documents can be extremely costly especially for the large text collections. In this regard, data reduction techniques are viable tools in representing document collections. Latent Dirichlet allocation (LDA) is a popular generative probabilistic model to represent collections of discrete data. In this regard, this paper examines the performance of LDA in text sentiment classification. In the empirical analysis, five classification algorithms (Naïve Bayes, support vector machines, logistic regression, radial basis function network and K-nearest neighbor algorithms) and five ensemble methods (Bagging, AdaBoost, Random Subspace, voting and stacking) are evaluated on four sentiment datasets.