Downloads provided by UsageCounts
The model expects a raw audio signal as input and outputs predictions for arousal, dominance and valence in a range of approximately 0...1. In addition, it also provides the pooled states of the last transformer layer. The model was created by fine-tuning a pre-trained wav2vec 2.0 model on MSP-Podcast (v1.7). As foundation we use wav2vec2-large-robust released by Facebook under Apache.2.0, which we pruned from 24 to 12 transformer layers before fine-tuning. The model was afterwards exported to ONNX format. Further details are given in the associated paper. For an introduction how to use the model, please visit our tutorial project. The original [Torch](https://pytorch.org/docs/stable/torch.html) model is hosted on Hugging Face.
Valence, Speech emotion recognition, wav2vec 2.0, ONNX, Transformer model, Deep learning, MSP-Podcast, Arousal, Dominance
Valence, Speech emotion recognition, wav2vec 2.0, ONNX, Transformer model, Deep learning, MSP-Podcast, Arousal, Dominance
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 921 | |
| downloads | 3K |

Views provided by UsageCounts
Downloads provided by UsageCounts