Model for Age and Gender Prediction based on Wav2vec 2.0

The model expects a raw audio signal as input and outputs an age score in range of 0...1 (0-100 years) and gender predictions (female, male, child). In addition, it also provides the pooled states of the last transformer layer. The model was created by fine-tuning a pre-trained wav2vec 2.0 model. As foundation we used wav2vec2-large-robust released by Facebook under Apache.2.0. We provide two models: one with all 24 transformer layers and a stripped-down version with six transformer layers. Both models were exported to ONNX format. For training we used aGender, Mozilla Common Voice, Timit and Voxceleb 2. For each database we provide file lists for the splits (train, dev, test) in audformat. The CSV files can be loaded as a pandas.DataFrame with audformat.utils.read_csv(). Further details are given in the associated paper (tba). For an introduction how to use the model, please visit our tutorial project.

Related Organizations

Universität Augsburg
Germany

Keywords

wav2vec 2.0, Age prediction, Gender detection, ONNX, Transformer model, Deep learning

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average