q-frame hash comparison based exact stringmatching algorithms for DNA sequences


KARCIOĞLU A. A. , BULUT H.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume:
  • Publication Date: 2021
  • Doi Number: 10.1002/cpe.6505
  • Title of Journal : CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
  • Keywords: DNA sequences, hash function, hash-based string matching, pattern matching, sequence analysis, stringmatching algorithms

Abstract

The importance of string matching is due to its applications in many fields, such as medicine and bioinformatics. Various string matching algorithms are developed to speed up the search. Especially, hash-based exact string matching algorithms are among the most time-efficient ones. The efficiency of hash-based approaches depends on the hash function. Hence, perfect hashing plays an essential role in hash-based string matching. In this study, two q-frame hash comparison-based exact string matching algorithms, Hq-QF and HqBM-QF, are proposed. We have used a collision-free perfect hash function for DNA sequences in the proposed algorithms. In the first approach, after hash values match for the last qcharacters, the character comparisons in the Hash-q algorithm are replaced with q-frame hash comparison. In the second approach, we improved the first approach by utilizing the shift size indicated at the (m - 1)th entry in the good suffix shift table. Since the number of character comparisons is minimized, the worst-case time complexity of the proposed algorithms is O(n(m - ([m/q] q))). In both approaches, q-frame hash comparisons replace most character comparisons as a trade-off. The results show that the proposed approaches are more efficient than the Hash-q algorithm in terms of runtime efficiency and the number of character comparisons.