Similarity measures for sequential data

descriptionPublicationkeyboard_double_arrow_right Article 20 May 2011 English Publisher:WileyJournal:WIREs Data Mining and Knowledge Discovery, volume 1, pages 296-304 (issn: 1942-4787, eissn: 1942-4795,

Copyright policy )

Authors: Konrad Rieck;

doi: 10.1002/widm.36

Similarity measures for sequential data

- Summary
- Metrics

Abstract

AbstractExpressive comparison of strings is a prerequisite for analysis of sequential data in many areas of computer science. However, comparing strings and assessing their similarity is not a trivial task and there exists several contrasting approaches for defining similarity measures over sequential data. In this paper, we review three major classes of such similarity measures: edit distances, bag‐of‐word models, and string kernels. Each of these classes originates from a particular application domain and models similarity of strings differently. We present these classes and underlying comparisons in detail, highlight advantages, and differences as well as provide basic algorithms supporting practical applications. © 2011 John Wiley & Sons, Inc.WIREs Data Mining Knowl Discov2011 1 296–304 DOI: 10.1002/widm.36This article is categorized under:Algorithmic Development > Biological Data MiningAlgorithmic Development > Text MiningFundamental Concepts of Data and Knowledge > Data ConceptsFundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining

Related Organizations

Technical University of Berlin
Germany
Technical University of Berlin
Germany

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	9
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

9

Average

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now