Stemming in agglutinative languages: A probabilistic stemmer for Turkish


DINCER B. , Karaoglan B.

COMPUTER AND INFORMATION SCIENCES - ISCIS 2003, cilt.2869, ss.244-251, 2003 (SCI İndekslerine Giren Dergi)

  • Cilt numarası: 2869
  • Basım Tarihi: 2003
  • Dergi Adı: COMPUTER AND INFORMATION SCIENCES - ISCIS 2003
  • Sayfa Sayısı: ss.244-251

Özet

In this paper, we introduce a new lexicon free, probabilistic stemmer to be used in a developing Turkish Information Retrieval system. It has a linear computational complexity and its test success ratio is 95.8%. The main contribution of this paper is to give a thorough description of a probabilistic perspective for stemming which can also be generalized to apply to other agglutinative languages like Finnish, Hungarian, Estonian and Czech.