Compression-Based Similarity

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jun 2011Embargo end date: 01 Jan 2011 Netherlands Publisher:IEEEJournal:2011 First International Conference on Data Compression, Communications and Processing

Authors: Vitányi, P.M.B.;

doi: 10.1109/ccp.2011.50 , 10.48550/arxiv.1110.4544

arXiv: 1110.4544

handle: 11245.1/f93e4b06-5893-4d80-9c42-d288c14c4ab6

Compression-Based Similarity

- Summary
- Subjects
- Metrics

Abstract

First we consider pair-wise distances for literal objects consisting of finite binary files. These files are taken to contain all of their meaning, like genomes or books. The distances are based on compression of the objects concerned, normalized, and can be viewed as similarity distances. Second, we consider pair-wise distances between names of objects, like "red" or "christianity." In this case the distances are based on searches of the Internet. Such a search can be performed by any search engine that returns aggregate page counts. We can extract a code length from the numbers returned, use the same formula as before, and derive a similarity or relative semantics between names for objects. The theory is based on Kolmogorov complexity. We test both similarities extensively experimentally.

Latex, 8 pages, 2 fgures, in Proc. IEEE 1st Int. Conf. Data Compression, Communication and Processing, Palurno, Italy, June 21-24, 2011, 111--118

Country

Netherlands

Related Organizations

University of Amsterdam
Netherlands

Keywords

FOS: Computer and information sciences, Computer Science - Information Theory, Information Theory (cs.IT), 004

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Related to Research communities

Netherlands Research Portal