
While MIDI velocity estimation in piano music transcription has been widely studied, similar work for other instruments remains underexplored. Unlike piano MIDI velocity, which provides note-level volume modulation, MIDI Expression (CC11) provides continuous volume modulation across a note’s duration, requiring finer temporal resolution. This paper addresses the task of estimating MIDI CC11 values from wind and string instrument audio recordings. To explore suitable estimation methods, we first investigate the numerical relationship between MIDI CC11 and audio Root Mean Square (RMS) energy. Motivated by the analysis results of the MIDI CC11–RMS relationship, we compare three estimation approaches: linear, quadratic, and BiLSTM-based deep learning. We adopt a Simulation-to-Reality (Sim2Real) strategy, training models on synthetic audio rendered from randomized MIDI CC11 curves and evaluating on real performance recordings. Unlike approaches requiring manually labeled data, ours relies entirely on synthetic training, avoiding the need for expert annotation. Experiments on violin, viola, flute, and trumpet demonstrate the effectiveness of the Sim2Real approach, with the deep learning model achieving the best performance. Using the deep learning model, we generate a MIDI dataset enriched with fine-grained MIDI CC11 annotations, which can be used for future expressive music analysis, modeling, or generation. All transcribed data are available online.
Automatic music transcription, MIDI Expression (CC11), Deep learning, Expressive music performance
Automatic music transcription, MIDI Expression (CC11), Deep learning, Expressive music performance
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
