publication . Other literature type . Preprint . Conference object . 2017

Score-Informed Syllable Segmentation For A Cappella Singing Voice With Convolutional Neural Networks.

Pons Puig, Jordi; Gong, Rong; Serra, Xavier;
Open Access
  • Published: 23 Oct 2017
  • Publisher: Zenodo
  • Country: Spain
Abstract
This paper introduces a new score-informed method for the segmentation of jingju a cappella singing phrase into syllables. The proposed method estimates the most likely sequence of syllable boundaries given the estimated syllable onset detection function (ODF) and its score. Throughout the paper, we first examine the jingju syllables structure and propose a definition of the term “syllable onset”. Then, we identify which are the challenges that jingju a cappella singing poses. Further, we investigate how to improve the syllable ODF estimation with convolutional neural networks (CNNs). We propose a novel CNN architecture that allows to efficiently capture differe...
Subjects
free text keywords: Computer Science - Sound, Música -- Informàtica
Funded by
EC| COMPMUSIC
Project
COMPMUSIC
Computational models for the discovery of the world's music
  • Funder: European Commission (EC)
  • Project Code: 267583
  • Funding stream: FP7 | SP2 | ERC
Download fromView all 6 versions
Zenodo
Other literature type . 2017
Provider: Datacite
Zenodo
Other literature type . 2017
Provider: Datacite
ZENODO
Conference object . 2017
Provider: ZENODO

[6] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.

[7] A. Klapuri. Sound onset detection by applying psychoacoustic knowledge. In ICASSP, Phoenix, USA, 1999.

[8] Anna M. Kruspe. Keyword spotting in a-capella singing. In ISMIR, Taipei, Taiwan, 2014. [OpenAIRE]

[9] Emilio Molina, Ana M. Barbancho, Lorenzo J. Tardn, and Isabel Barbancho. Evaluation Framework for Automatic Singing Transcription. In ISMIR, Taipei, Taiwan, 2014. [OpenAIRE]

[10] N. Obin, F. Lamare, and A. Roebel. Syll-O-Matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables. In ICASSP, Vancouver, Canada, 2013. [OpenAIRE]

[11] Jordi Pons and Xavier Serra. Designing efficient architectures for modeling temporal features with convolutional neural networks. In ICASSP, New orleans, USA, 2017.

[12] Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Go´mez, and Xavier Serra. Timbre analysis of music audio signals with convolutional neural networks. arxiv:1703.06697, 2017. [OpenAIRE]

[13] L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286, February 1989.

[14] Rafael Caro Repetto and Xavier Serra. Creating a Corpus of Jingju (Beijing Opera) Music and Possibilities for Melodic Analysis. In ISMIR, Taipei, Taiwan, 2014. [OpenAIRE]

[15] Odette Scharenborg, Vincent Wan, and Mirjam Ernestus. Unsupervised speech segmentation: An analysis of the hypothesized phone boundaries. The Journal of the Acoustical Society of America, 127(2):1084-1095, 2010.

[16] J. Schlu¨ter and S. Bo¨ck. Improved musical onset detection with convolutional neural networks. In ICASSP, Florence, Italy, 2014.

[17] Chee-Chuan Toh, Bingjun Zhang, and Ye Wang. Multiple-feature fusion based onset detection for solo singing voice. In ISMIR, Philadelphia, USA, 2008. [OpenAIRE]

[18] Elizabeth Wichmann. Listening to Theatre: The Aural Dimension of Beijing Opera. University of Hawaii Press, 1991.

Abstract
This paper introduces a new score-informed method for the segmentation of jingju a cappella singing phrase into syllables. The proposed method estimates the most likely sequence of syllable boundaries given the estimated syllable onset detection function (ODF) and its score. Throughout the paper, we first examine the jingju syllables structure and propose a definition of the term “syllable onset”. Then, we identify which are the challenges that jingju a cappella singing poses. Further, we investigate how to improve the syllable ODF estimation with convolutional neural networks (CNNs). We propose a novel CNN architecture that allows to efficiently capture differe...
Subjects
free text keywords: Computer Science - Sound, Música -- Informàtica
Funded by
EC| COMPMUSIC
Project
COMPMUSIC
Computational models for the discovery of the world's music
  • Funder: European Commission (EC)
  • Project Code: 267583
  • Funding stream: FP7 | SP2 | ERC
Download fromView all 6 versions
Zenodo
Other literature type . 2017
Provider: Datacite
Zenodo
Other literature type . 2017
Provider: Datacite
ZENODO
Conference object . 2017
Provider: ZENODO

[6] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.

[7] A. Klapuri. Sound onset detection by applying psychoacoustic knowledge. In ICASSP, Phoenix, USA, 1999.

[8] Anna M. Kruspe. Keyword spotting in a-capella singing. In ISMIR, Taipei, Taiwan, 2014. [OpenAIRE]

[9] Emilio Molina, Ana M. Barbancho, Lorenzo J. Tardn, and Isabel Barbancho. Evaluation Framework for Automatic Singing Transcription. In ISMIR, Taipei, Taiwan, 2014. [OpenAIRE]

[10] N. Obin, F. Lamare, and A. Roebel. Syll-O-Matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables. In ICASSP, Vancouver, Canada, 2013. [OpenAIRE]

[11] Jordi Pons and Xavier Serra. Designing efficient architectures for modeling temporal features with convolutional neural networks. In ICASSP, New orleans, USA, 2017.

[12] Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Go´mez, and Xavier Serra. Timbre analysis of music audio signals with convolutional neural networks. arxiv:1703.06697, 2017. [OpenAIRE]

[13] L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286, February 1989.

[14] Rafael Caro Repetto and Xavier Serra. Creating a Corpus of Jingju (Beijing Opera) Music and Possibilities for Melodic Analysis. In ISMIR, Taipei, Taiwan, 2014. [OpenAIRE]

[15] Odette Scharenborg, Vincent Wan, and Mirjam Ernestus. Unsupervised speech segmentation: An analysis of the hypothesized phone boundaries. The Journal of the Acoustical Society of America, 127(2):1084-1095, 2010.

[16] J. Schlu¨ter and S. Bo¨ck. Improved musical onset detection with convolutional neural networks. In ICASSP, Florence, Italy, 2014.

[17] Chee-Chuan Toh, Bingjun Zhang, and Ye Wang. Multiple-feature fusion based onset detection for solo singing voice. In ISMIR, Philadelphia, USA, 2008. [OpenAIRE]

[18] Elizabeth Wichmann. Listening to Theatre: The Aural Dimension of Beijing Opera. University of Hawaii Press, 1991.

Any information missing or wrong?Report an Issue