Languages of lossless seeds

Karel Břinda
(LIGM Université Paris-Est Marne-la-Vallée)

Several algorithms for similarity search employ seeding techniques to quickly discard very dissimilar regions. In this paper, we study theoretical properties of lossless seeds, i.e., spaced seeds having full sensitivity. We prove that lossless seeds coincide with languages of certain sofic subshifts, hence they can be recognized by finite automata. Moreover, we show that these subshifts are fully given by the number of allowed errors k and the seed margin l. We also show that for a fixed k, optimal seeds must asymptotically satisfy l ~ m^(k/(k+1)).

In Zoltán Ésik and Zoltán Fülöp: Proceedings 14th International Conference on Automata and Formal Languages (AFL 2014), Szeged, Hungary, May 27-29, 2014, Electronic Proceedings in Theoretical Computer Science 151, pp. 139–150.
Published: 21st May 2014.

ArXived at: https://dx.doi.org/10.4204/EPTCS.151.9 bibtex PDF
References in reconstructed bibtex, XML and HTML format (approximated).
Comments and questions to: eptcs@eptcs.org
For website issues: webmaster@eptcs.org