Comparison of Topic Modeling Methods for Type Detection of Turkish News


Güven Z. A. , Diri B., Çakaloğlu T.

4th International Conference on Computer Science and Engineering (UBMK), Samsun, Turkey, 11 - 15 September 2019 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/ubmk.2019.8907050
  • City: Samsun
  • Country: Turkey
  • Keywords: Topic Modelling, Latent Dirichlet Allocation, Natural Language Processing, New Analysis, Non-Negative Matrix Factorization, Latent Semantic Analysis

Abstract

Today, with the increase of Internet-based documents, we are presented with many data that need to be processed and evaluated. Media, news and advertising are some of the areas where these data arc evaluated. For the news, the classification of people in the media sector is an important problem in terms of time. In this paper, it is aimed to determine which types of news titles belong to. The dataset consists of 4200 Turkish new titles belonging to 7 class labels. In order to determine the types, classical Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA) and Non-Negative Matrix Factorization (NMF) algorithms were used in topic modeling. In addition, the LDA-based n-LDA method was also used. The accuracy of all methods used was measured and compared. NMF was the most successful method for three classes, while for five and seven classes LSA was the most successful method.