
handle: 1959.4/unsworks_79713
The processing of speech as an explicit sequence of events is common in automatic speech recognition (linguistic events), but has received relatively little attention in paralinguistic speech classification despite its potential for characterizing broad acoustic event sequences. This paper proposes a framework for analyzing speech as a sequence of acoustic events, and investigates its application to depression detection. In this framework, acoustic space regions are tokenized to ‘words’ representing speech events at fixed or irregular intervals. This tokenization allows the exploitation of acoustic word features using proven natural language processing methods. A key advantage of this framework is its ability to accommodate heterogeneous event types: herein we combine acoustic words and speech landmarks, which are articulation-related speech events. Another advantage is the option to fuse such heterogeneous events at various levels, including the embedding level. Evaluation of the proposed framework on both controlled laboratory-grade supervised audio recordings as well as unsupervised self-administered smartphone recordings highlight the merits of the proposed framework across both datasets, with the proposed landmark-dependent acoustic words achieving improvements in F1(depressed) of up to 15% and 13% for SH2-FS and DAIC-WOZ respectively, relative to acoustic speech baseline approaches.
anzsrc-for: 1005 Communications Technologies, 4608 Human-Centred Computing, Depression, anzsrc-for: 46 Information and Computing Sciences, anzsrc-for: 4603 Computer vision and multimedia computation, Mental Illness, Brain Disorders, 004, anzsrc-for: 4608 Human-Centred Computing, Mental Health, 46 Information and Computing Sciences, Clinical Research, Behavioral and Social Science, anzsrc-for: 0801 Artificial Intelligence and Image Processing, anzsrc-for: 4006 Communications engineering, anzsrc-for: 0906 Electrical and Electronic Engineering
anzsrc-for: 1005 Communications Technologies, 4608 Human-Centred Computing, Depression, anzsrc-for: 46 Information and Computing Sciences, anzsrc-for: 4603 Computer vision and multimedia computation, Mental Illness, Brain Disorders, 004, anzsrc-for: 4608 Human-Centred Computing, Mental Health, 46 Information and Computing Sciences, Clinical Research, Behavioral and Social Science, anzsrc-for: 0801 Artificial Intelligence and Image Processing, anzsrc-for: 4006 Communications engineering, anzsrc-for: 0906 Electrical and Electronic Engineering
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 31 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
