Sentence boundary detection in Turkish


DINCER B. , Karaoglan B.

ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, cilt.3261, ss.255-262, 2004 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 3261
  • Basım Tarihi: 2004
  • Doi Numarası: 10.1007/978-3-540-30198-1_26
  • Dergi Adı: ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS
  • Sayfa Sayıları: ss.255-262

Özet

In this paper, we describe a solution method for sentence boundary detection in Turkish. The method exploits simple heuristic knowledge of Turkish syllabication and its phonetic rules for disambiguation of dots. The test accuracy of the algorithm is measured as 96.02%. The main contribution of this study is considered as presenting a new lexicon free method for differentiating EOS (end of sentence) dots from the ones that are used for other purposes.