A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification

Onan A., Korukoglu S. , Bulut H.

EXPERT SYSTEMS WITH APPLICATIONS, vol.62, pp.1-16, 2016 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 62
  • Publication Date: 2016
  • Doi Number: 10.1016/j.eswa.2016.06.005
  • Page Numbers: pp.1-16
  • Keywords: Sentiment analysis, Ensemble learning, Weighted majority voting, Multiobjective optimization, GENETIC ALGORITHM, OPTIMIZATION, SELECTION, RECOGNITION, PREDICTION, FRAMEWORK, MACHINE


Typically performed by supervised machine learning algorithms, sentiment analysis is highly useful for extracting subjective information from text documents online. Most approaches that use ensemble learning paradigms toward sentiment analysis involve feature engineering in order to enhance the predictive performance. In response, we sought to develop a paradigm of a multiobjective, optimization-based weighted voting scheme to assign appropriate weight values to classifiers and each output class based on the predictive performance of classification algorithms, all to enhance the predictive performance of sentiment classification. The proposed ensemble method is based on static classifier selection involving majority voting error and forward search, as well as a multiobjective differential evolution algorithm. Based on the static classifier selection scheme, our proposed ensemble method incorporates Bayesian logistic regression, naive Bayes, linear discriminant analysis, logistic regression, and support vector machines as base learners, whose performance in terms of precision and recall values determines weight adjustment. Our experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed classification scheme can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting. Of all datasets examined, the laptop dataset showed the best classification accuracy (98.86%). (C) 2016 Elsevier Ltd. All rights reserved.