Downloads provided by UsageCounts
Audio alignment is a fundamental preprocessing step in many MIR pipelines. For two audio clips with M and N frames, respectively, the most popular approach, dynamic time warping (DTW), has O(MN) requirements in both memory and computation, which is prohibitive for frame-level alignments at reasonable rates. To address this, a variety of memory efficient algorithms exist to approximate the optimal alignment under the DTW cost. To our knowledge, however, no exact algorithms exist that are guaranteed to break the quadratic memory barrier. In this work, we present a divide and conquer algorithm that computes the exact globally optimal DTW alignment using O(M+N) memory. Its runtime is still O(MN), trading off memory for a 2x increase in computation. However, the algorithm can be parallelized up to a factor of min(M, N) with the same memory constraints, so it can still run more efficiently than the textbook version with an adequate GPU. We use our algorithm to compute exact alignments on a collection of orchestral music, which we use as ground truth to benchmark the alignment accuracy of several popular approximate alignment schemes at scales that were not previously possible.
12 Pages, 6 Figures, 1 Table, ISMIR 2020
FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), H.5.5; H.3.3; F.2.1, Computer Science - Sound, Computer Science - Information Retrieval, Machine Learning (cs.LG), Multimedia (cs.MM), H.3.3, Audio and Speech Processing (eess.AS), H.5.5, FOS: Electrical engineering, electronic engineering, information engineering, F.2.1, Computer Science - Multimedia, Information Retrieval (cs.IR), Electrical Engineering and Systems Science - Audio and Speech Processing
FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), H.5.5; H.3.3; F.2.1, Computer Science - Sound, Computer Science - Information Retrieval, Machine Learning (cs.LG), Multimedia (cs.MM), H.3.3, Audio and Speech Processing (eess.AS), H.5.5, FOS: Electrical engineering, electronic engineering, information engineering, F.2.1, Computer Science - Multimedia, Information Retrieval (cs.IR), Electrical Engineering and Systems Science - Audio and Speech Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 2 | |
| downloads | 2 |

Views provided by UsageCounts
Downloads provided by UsageCounts