Zipf's law of burstiness in Turkish: The length of intervals between repetitions


KOCABAŞ İ. , KIŞLA T. , KARAOĞLAN B.

22nd International Symposium on Computer and Information Sciences, Ankara, Türkiye, 7 - 09 Kasım 2007, ss.121-123

Özet

Zipf law of burstiness of content words is being less studied than his laws that describe the relation between the rank and the frequency of words. Zipf counted the number of intervals of the same length between the repetitions of the words belonging to the same frequency class and on a 260,000 word English corpus empirically showed that the interval size, I, between each occurrence of a word is inversely proportional to the number of intervals having that size: F alpha I-P, where p varied between 1 and 1.3. In this study we investigated the validity of the law of burstiness on a Turkish corpus of size 55,000 and found p varying between 0.5 and 0.8.