A Hybrid Recurrent Neural Network For Music Transcription

Unknown, Article, Preprint English OPEN
Sigtia, Siddharth ; Benetos, Emmanouil ; Boulanger-Lewandowski, Nicolas ; Weyde, Tillman ; Garcez, Artur S. d'Avila ; Dixon, Simon (2014)
  • Subject: RC0321 | QA75 | M | TA | Computer Science - Learning
    arxiv: Computer Science::Sound

We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance. We use recurrent neural networks (RNNs) and their variants as music language models (MLMs) and present a generative architecture for combining these models with predictions from a frame level acoustic classifier. We also compare different neural network architectures for acoustic modeling. The proposed model computes a distribution over possible output sequences given the acoustic input signal and we present an algorithm for performing a global search for good candidate transcriptions. The performance of the proposed model is evaluated on piano music from the MAPS dataset and we observe that the proposed model consistently outperforms existing transcription methods.
  • References (19)
    19 references, page 1 of 2

    [1] Mert Bay, Andreas F Ehmann, and J Stephen Downie. Evaluation of multiple-F0 estimation and tracking systems. In International Society for Music Information Retrieval Conference, pages 315-320, 2009.

    [2] Sebastian Bock and Markus Schedl. Polyphonic piano note transcription with recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 121-124. IEEE, 2012.

    [3] Nicolas Boulanger-lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1159-1166, 2012.

    [4] Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. High-dimensional sequence transduction. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3178-3182. IEEE, 2013.

    [5] Nicolas Boulanger-Lewandowski, Jasha Droppo, Mike Seltzer, and Dong Yu. Phone sequence modeling with recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5417-5421, May 2014.

    [6] Valentin Emiya, Roland Badeau, and Bertrand David. Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing, 18(6):1643-1654, 2010.

    [7] Ian Goodfellow, Honglak Lee, Quoc V Le, Andrew Saxe, and Andrew Y Ng. Measuring invariances in deep networks. In Advances in neural information processing systems, pages 646- 654, 2009.

    [8] Alex Graves. Sequence transduction with recurrent neural networks. In Representation Learning Workshop, ICML, 2012.

    [9] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82-97, 2012.

    [10] Anssi Klapuri and Manuel Davy, editors. Signal Processing Methods for Music Transcription. Springer-Verlag, New York, 2006.

  • Metrics
    No metrics available
Share - Bookmark