Using geometric techniques like projection and dimensionality reduction, we show that there exists a randomized sub-linear time algorithm that can estimate the Hamming distance between two matrices. Consider two matrices ${\bf A}$ and ${\bf B}$ of size $n \times n$ whose dimensions are known to the algorithm but the entries are not. The entries of the matrix are real numbers. The access to any matrix is through an oracle that computes the projection of a row (or a column) of the matrix on a vector in $\{0,1\}^n$. We call this query oracle to be an {\sc Inner Product} oracle (shortened as {\sc IP}). We show that our algorithm returns a $(1\pm ��)$ approximation to ${\bf D}_{\bf M} ({\bf A},{\bf B})$ with high probability by making ${\cal O}\left(\frac{n}{\sqrt{{\bf D}_{\bf M} ({\bf A},{\bf B})}}\mbox{poly}\left(\log n, \frac{1}��\right)\right)$ oracle queries, where ${\bf D}_{\bf M} ({\bf A},{\bf B})$ denotes the Hamming distance (the number of corresponding entries in which ${\bf A}$ and ${\bf B}$ differ) between two matrices ${\bf A}$ and ${\bf B}$ of size $n \times n$. We also show a matching lower bound on the number of such {\sc IP} queries needed. Though our main result is on estimating ${\bf D}_{\bf M} ({\bf A},{\bf B})$ using {\sc IP}, we also compare our results with other query models.

30 pages. Accepted in RANDOM'21

Related Organizations

Schloss Dagstuhl – Leibniz Center for Informatics
Germany
Indian Statistical Institute
India
Leibniz Association
Germany

Keywords

FOS: Computer and information sciences, Property testing, Dimensionality reduction, 004, Distance estimation, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), Sub-linear algorithms, ddc: ddc:004

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Fields of Science (4) View all

natural sciences

computer and information sciences

Fields of Science

natural sciences

computer and information sciences

View all