Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson’s Disease and Cleft Lip and Palate

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Sep 2020 English Publisher:Elsevier BVJournal:Speech Communication, volume 122, pages 56-67 (issn: 0167-6393,

Copyright policy )Funded by:EC | TAPAS

Authors: Juan Camilo Vásquez-Correa; Tomás Arias‐Vergara; Martin J. Schuster; Juan Rafael Orozco-Arroyave; Elmar Nöth;

doi: 10.1016/j.specom.2020.07.005 , 10.60692/0za0w-64y65 , 10.60692/3e7yx-0yd64

Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson’s Disease and Cleft Lip and Palate

- Summary
- Subjects
- Metrics

Abstract

Les signaux vocaux peuvent contenir différents aspects paralinguistiques tels que la présence de pathologies qui affectent les capacités de communication appropriées d'un locuteur. Ces troubles de la parole ont une origine différente selon le type de la maladie. Par exemple, les maladies d'origine morphologique telles que les fentes labiales et palatines qui provoquent une hypernasalité, ou d'origine neurodégénérative telles que la maladie de Parkinson qui génère une dysarthrie hypokinétique chez les patients. L'évaluation automatique de la parole pathologique permet de soutenir le diagnostic et/ou l'évaluation de la gravité de la maladie. Les méthodes conventionnelles sont basées sur l'évaluation manuelle de caractéristiques uniques telles que la gigue, le chatoiement ou les fréquences de formation qui peuvent ne pas modéliser complètement tous les phénomènes qui apparaissent en raison de la maladie. Cet article présente une nouvelle stratégie basée sur l'apprentissage non supervisé de la représentation pour la détection automatique de la parole pathologique. L'approche proposée est basée sur l'utilisation d'autoencodeurs récurrents et convolutifs formés pour extraire des caractéristiques informatives afin de caractériser la présence de pathologies dans la parole. Un nouvel ensemble de fonctionnalités basé sur l'erreur de reconstruction des codeurs automatiques est également proposé. La performance des modèles introduits est évaluée en classant les signaux de parole pathologiques enregistrés chez les personnes atteintes de la maladie de Parkinson et les enfants présentant une fente labiale et palatine. Tous les participants à cette étude étaient hispanophones. Les modèles proposés sont précis pour classer les signaux vocaux des deux types de maladies, avec une précision allant jusqu'à 97 % pour la fente labiale et palatine, et jusqu'à 84 % pour le cas de la maladie de Parkinson. Nous montrons également que l'erreur de reconstruction des autoencodeurs dans différentes régions de fréquence contient des informations liées à des symptômes de parole spécifiques des deux maladies.

Las señales del habla pueden contener diferentes aspectos paralingüísticos, como la presencia de patologías que afectan las capacidades de comunicación adecuadas de un hablante. Esos trastornos del habla tienen un origen diferente según el tipo de enfermedad. Por ejemplo, enfermedades de origen morfológico como labio leporino y paladar hendido que provoca hipernasalidad, o de origen neurodegenerativo como la enfermedad de Parkinson que genera disartria hipocinética en los pacientes. La evaluación automática del habla patológica permite apoyar el diagnóstico y/o la evaluación de la gravedad de la enfermedad. Los métodos convencionales se basan en la evaluación aplicada manualmente de características únicas como la fluctuación, el brillo o las frecuencias de formantes que pueden no modelar completamente todos los fenómenos que aparecen debido a la enfermedad. Este artículo presenta una nueva estrategia basada en el aprendizaje de la representación no supervisada para la detección automática del habla patológica. El enfoque propuesto se basa en el uso de autocodificadores recurrentes y convolucionales entrenados para extraer rasgos informativos para caracterizar la presencia de patologías en el habla. También se propone un nuevo conjunto de características basado en el error de reconstrucción de los autocodificadores. Se evalúa el rendimiento de los modelos introducidos clasificando las señales patológicas del habla registradas de personas que padecen la enfermedad de Parkinson y niños con labio leporino y paladar hendido. Todos los participantes de este estudio eran hablantes nativos de español. Los modelos propuestos son precisos para clasificar las señales del habla de ambos tipos de enfermedades, con una precisión de hasta el 97% para el labio leporino y el paladar hendido, y hasta el 84% para el caso de la enfermedad de Parkinson. También mostramos que el error de reconstrucción de los autocodificadores en diferentes regiones de frecuencia contiene información relacionada con síntomas específicos del habla de ambas enfermedades.

Speech signals may contain different paralinguistic aspects such as the presence of pathologies that affect the proper communication capabilities of a speaker. Those speech disorders have different origin depending on the type of the disease. For instance, diseases with morphological origin such as cleft lip and palate that causes hypernasality, or with neurodegenerative origin such as Parkinson's disease that generates hypokinetic dysarthria on the patients. Automatic assessment of pathological speech allows to support the diagnosis and/or the evaluation of the disease severity. Conventional methods are based on the manually applied assessment of single features such as jitter, shimmer, or formant frequencies that may not completely model all of the phenomena that appear due to the disease. This paper introduces a novel strategy based on unsupervised representation learning for automatic detection of pathological speech. The proposed approach is based on the use of recurrent and convolutional autoencoders trained to extract informative features to characterize the presence of pathologies in speech. A novel feature set based on the reconstruction error of the autoencoders is also proposed. The performance of the introduced models is evaluated classifying pathological speech signals recorded from people suffering from Parkinson's disease, and children with cleft lip and palate. All participants from this study were Spanish native speakers. The proposed models are accurate to classify the speech signals of both kinds of diseases, with an accuracy of up to 97% for cleft lip and palate, and up to 84% for the case of Parkinson's disease. We also show that the reconstruction error from the autoencoders in different frequency regions contain information related to specific speech symptoms of both diseases.

قد تحتوي إشارات الكلام على جوانب لغوية مختلفة مثل وجود أمراض تؤثر على قدرات التواصل المناسبة للمتحدث. لاضطرابات النطق هذه أصل مختلف اعتمادًا على نوع المرض. على سبيل المثال، الأمراض ذات الأصل المورفولوجي مثل الشفة المشقوقة والحنك التي تسبب فرط الأنفية، أو ذات الأصل التنكسي العصبي مثل مرض باركنسون الذي يولد عسر الكلام ناقص الحركة على المرضى. يسمح التقييم التلقائي للكلام المرضي بدعم تشخيص و/أو تقييم شدة المرض. تعتمد الطرق التقليدية على التقييم المطبق يدويًا للسمات الفردية مثل الارتعاش أو الوميض أو الترددات الصياغية التي قد لا تشكل تمامًا جميع الظواهر التي تظهر بسبب المرض. تقدم هذه الورقة استراتيجية جديدة تستند إلى تعلم التمثيل غير الخاضع للإشراف للكشف التلقائي عن الكلام المرضي. يعتمد النهج المقترح على استخدام أجهزة الترميز الذاتي المتكررة والتلافيفية المدربة على استخراج الميزات الإعلامية لتوصيف وجود الأمراض في الكلام. يُقترح أيضًا مجموعة ميزات جديدة تستند إلى خطأ إعادة البناء في أجهزة الترميز التلقائي. يتم تقييم أداء النماذج المقدمة لتصنيف إشارات الكلام المرضية المسجلة من الأشخاص الذين يعانون من مرض باركنسون، والأطفال الذين يعانون من الشفة المشقوقة والحنك المشقوق. كان جميع المشاركين في هذه الدراسة من الناطقين باللغة الإسبانية. النماذج المقترحة دقيقة لتصنيف إشارات الكلام لكلا النوعين من الأمراض، بدقة تصل إلى 97 ٪ للشفة المشقوقة والحنك المشقوق، وتصل إلى 84 ٪ لحالة مرض باركنسون. كما نوضح أن خطأ إعادة البناء من أجهزة الترميز التلقائي في مناطق التردد المختلفة يحتوي على معلومات تتعلق بأعراض الكلام المحددة لكلا المرضين.

Related Organizations

University of Erlangen-Nuremberg
Germany
Ludwig-Maximilians-Universität München
Germany
University of Antioquia
Colombia

Keywords

Artificial intelligence, Physiology, Feature (linguistics), Health Professions, Social Sciences, Experimental and Cognitive Psychology, Diagnosis and Treatment of Voice Disorders, Dysphagia and Swallowing Disorders, Speech Therapy, Speech recognition, Pattern recognition (psychology), Speech and Hearing, Acoustic Analysis, Phonetics, Health Sciences, Pathology, Psychology, Disease, Speech disorder, Vowel, Dysarthria, Linguistics, Audiology, Computer science, FOS: Philosophy, ethics and religion, FOS: Psychology, Philosophy, Speech Perception and Phonetics, Formant, FOS: Biological sciences, Speech Perception, FOS: Languages and literature, Medicine, Articulatory Phonetics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	35
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%