Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Conference object . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Fine-Grained MIDI Expression Transcription from Wind and String Instrument Audio via Sim2Real Transfer Learning

Authors: Xie, Yifan; Guo, Zixun; Barthet, Mathieu;

Fine-Grained MIDI Expression Transcription from Wind and String Instrument Audio via Sim2Real Transfer Learning

Abstract

While MIDI velocity estimation in piano music transcription has been widely studied, similar work for other instruments remains underexplored. Unlike piano MIDI velocity, which provides note-level volume modulation, MIDI Expression (CC11) provides continuous volume modulation across a note’s duration, requiring finer temporal resolution. This paper addresses the task of estimating MIDI CC11 values from wind and string instrument audio recordings. To explore suitable estimation methods, we first investigate the numerical relationship between MIDI CC11 and audio Root Mean Square (RMS) energy. Motivated by the analysis results of the MIDI CC11–RMS relationship, we compare three estimation approaches: linear, quadratic, and BiLSTM-based deep learning. We adopt a Simulation-to-Reality (Sim2Real) strategy, training models on synthetic audio rendered from randomized MIDI CC11 curves and evaluating on real performance recordings. Unlike approaches requiring manually labeled data, ours relies entirely on synthetic training, avoiding the need for expert annotation. Experiments on violin, viola, flute, and trumpet demonstrate the effectiveness of the Sim2Real approach, with the deep learning model achieving the best performance. Using the deep learning model, we generate a MIDI dataset enriched with fine-grained MIDI CC11 annotations, which can be used for future expressive music analysis, modeling, or generation. All transcribed data are available online.

Keywords

Automatic music transcription, MIDI Expression (CC11), Deep learning, Expressive music performance

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average