Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Preprint 01 Jan 2023Embargo end date: 01 Jan 2023 France Publisher:ISMIRJournal:CoRR, volume abs/2308.02723Funded by:EC | REACH

Authors: Shao, Keren; Chen, Ke; Berg-Kirkpatrick, Taylor; Dubnov, Shlomo;

doi: 10.5281/zenodo.10265373 , 10.48550/arxiv.2308.02723 , 10.5281/zenodo.10265372

arXiv: 2308.02723

Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

- Summary
- Subjects
- Metrics

Abstract

In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity (CFP) representation using discrete z-transform. Second, the vocal and non-vocal segments with extremely short duration are uncommon. To ensure a more stable melody contour, we design a differentiable loss function that prevents the model from predicting such segments. We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network. Our experimental results demonstrate that the proposed modifications are empirically effective for singing melody extraction.

7 pages, 4 figures, 2 tables, Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023

Country

France

Related Organizations

University of California, San Diego
United States
Sciences Po
France

Keywords

[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Computer Science - Artificial Intelligence, [INFO] Computer Science [cs], [INFO.INFO-SD] Computer Science [cs]/Sound [cs.SD], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], Computer Science - Sound, Machine Learning (cs.LG), Multimedia (cs.MM), Artificial Intelligence (cs.AI), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Funded by

EC| REACH

Related to Research communities

The European University of Social Sciences