Stemming in agglutinative languages: A probabilistic stemmer for Turkish


DINCER B. , Karaoglan B.

COMPUTER AND INFORMATION SCIENCES - ISCIS 2003, vol.2869, pp.244-251, 2003 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 2869
  • Publication Date: 2003
  • Title of Journal : COMPUTER AND INFORMATION SCIENCES - ISCIS 2003
  • Page Numbers: pp.244-251

Abstract

In this paper, we introduce a new lexicon free, probabilistic stemmer to be used in a developing Turkish Information Retrieval system. It has a linear computational complexity and its test success ratio is 95.8%. The main contribution of this paper is to give a thorough description of a probabilistic perspective for stemming which can also be generalized to apply to other agglutinative languages like Finnish, Hungarian, Estonian and Czech.