Powered by OpenAIRE graph
Found an issue? Give us feedback

LPP

Laboratoire de Phonétique et Phonologie
19 Projects, page 1 of 4
  • Funder: French National Research Agency (ANR) Project Code: ANR-14-CE35-0002
    Funder Contribution: 442,680 EUR
    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-15-CE23-0024
    Funder Contribution: 500,118 EUR

    The objective is to synthesize speech from text via the numerical simulation of the human speech production processes, i.e. the articulatory, aerodynamic and acoustic aspects. Corpus based approaches have taken a hegemonic place in text to speech synthesis. They exploit very good acoustic quality speech databases while covering a high number of expressions and of phonetic contexts. This is sufficient to produce intelligible speech. However, these approaches face almost insurmountable obstacles as soon as parameters intimately related to the physical process of speech production have to be modified. On the contrary, an approach which rests on the simulation of the physical speech production process makes explicitly use of source parameters, anatomy and geometry of the vocal tract, and of a temporal supervision strategy. It thus offers a direct control on the nature of the synthetic speech. The project is organized in 5 work packages: 1. Aerodynamic and acoustic simulations so as to produce a speech acoustic signal from the knowledge of the transversal area at any point of all the cavities of the vocal tract, 2. Source and coordination scenarios so as to coordinate sources together with the temporal evolution of the vocal tract, which is crucial for the production of consonants in order to ensure their identification by human listeners, 3. Supervision of the temporal evolution of the vocal tract geometry so as to anticipate the production of upcoming sounds and generate realistic articulatory gestures, 4. Acquisition of speech production data essential to know the vocal fold activation, aerodynamic parameters, and the geometrical shape of the vocal tract (via MRI at a high sampling rate), 5. General architecture to incorporate the different levels and synthesize an acoustic signal from the text. The development of realistic simulations of the speech production processes will be a key asset to understand the respective contributions of the anatomical characteristics, the coordination capabilities, and the control of the vocal folds in the resulting speech signal. The scope of this project goes far beyond the comprehension of speech production phenomena and concerns phonetics, motor control, and within the domain of automatic speech processing, at least text to speech synthesis. There is a number of applications. They concern situations in which standard text-to-speech synthesis is not well suited as foreign language learning or language acquisition. This project also opens new perspectives in the domain of expressive speech synthesis, and thus within the framework of conversational agents. In the medical field applications involve MRI acquisition protocols offering a high sampling rate applicable to organs which deform quickly over time, speech production pathologies, or evaluating the impact of surgery on the vocal folds or vocal tract. We firmly believe that ArtSpeech will realize scientific and major scientific and technical advances, and will demonstrate the interest of the physical approach whether to open new research perspectives, or develop highly innovative applications in the domain of speech production in the broadest sense. The consortium consists of four remarkably complementary research teams with leading international theoretical and practical experiences in the domains of: • aerodynamic and acoustic simulation of speech production, and modeling of the source and the geometry of the vocal tract, • magnetic resonance imaging and other acquisition techniques of speech production data.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-22-CE19-0035
    Funder Contribution: 499,496 EUR

    Dynamic translaryngeal ultrasound (dTLUS), a non-invasive and inexpensive technique, has emerged in recent years as an alternative to nasofibroscopy, a minimally invasive method to assess vocal cord paralysis. This paralysis is the major risk (occurrence of 3 to 5%) associated with cervical surgery (100 000 procedures per year in France). The first works of our consortium have shown the performances of dTLUS after thyroid or parathyroid surgery, to diagnose early the paralysis of one of the vocal cords. The objective of VOCALISE is to propose a new approach allowing a better characterization of postoperative or radiation-induced dysphonia. It consists in associating to the dTLUS optimized acquisitions of the vibration of each vocal cord during phonation simultaneously with voice/speech recordings. A software for the analysis of the displacement of the arytenoids, surrogate markers of the vocal cords, will be developed to finely quantify the mobility of the laryngeal structures, by combining classical methods of motion analysis and deep learning methods. This approach will be evaluated: 1) to monitor speech therapy in patients with recurrent nerve injury and 2) to qualify radiation-induced dysphonia in patients with Head and Neck cancers treated with radiotherapy. For this project, three academic partners with complementary skills who have been collaborating for several years have joined forces with two industrial partners: Mindray Medical France and Apteryx. Mindray will develop an acquisition module dedicated to the functional study of vocal folds. Apteryx will develop a software to quantify the movement of laryngeal structures. This software will also allow the follow-up and the optimization of the management of patients with dysphonia.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-19-CE38-0015
    Funder Contribution: 464,668 EUR

    The main objective of the CLD2025 project is to facilitate the urgent task of documenting endangered languages by leveraging the potential of computational methods. A breakthrough is now possible: machine learning tools (such as artificial neural networks and Bayesian models) have improved to a point where they can effectively help to perform linguistic annotation tasks such as automatic transcription of audio recordings, automatic glossing of texts, and automatic word discovery. Thorough documentation of the world’s dwindling linguistic diversity is much more feasible with these tools than under a manual workflow. For instance, manual transcription of 50 hours of speech (a sizeable fieldwork corpus) can take hundreds of hours’ work, creating a bottleneck in the language documentation workflow. Another key task, referred to in linguistics as interlinear glossing (in a nutshell: word-by-word translation/annotation), is even more time-consuming, and is moreover difficult to perform manually with the required level of consistency. Models created through machine learning have the potential to aid in these time-consuming and difficult tasks. But Natural Language Processing (NLP) remains little-used in language documentation for a variety of reasons such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and there are few case studies demonstrating practical usefulness in a low-resource setting. Field linguists typically rely on manual methods throughout the documentation process. The objective of the CLD2025 project is therefore to enable the implementation of these techniques in the mid term (by 2025) by developing a co-construction of models and tools by field linguists and computational linguists, and the development of interfaces and systems that allow real use by field linguists. We are building on the achievements of the BULB project in terms of corpora and modes of acquisition, as well as the development of models for transcription and segmentation. We are not developing corpora here, but rather focusing on how to exploit existing corpora. We address automatic processing problems (phoneme and tone transcription, unit discovery, automatic glossing), some of which are original (tonal transcription, automatic glossing), by validating them on endangered languages of very varied natures: Bantu Mboshi C25, Mande Kakabe, a Sino-Tibetan language, Yongning Na (Mosuo), and 3 Nakh-Daghestanian languages, Khinalug, Kryz (Kryts), Budugh. We will perform work to leverage the results of the improved automatic processing to the linguistic work level: the automatic speech and language processing mechanisms and results will be used to explore phonetic-phonological issues on segmental, supra-segmental and tonal levels of the languages addressed in the project, Finally, from the beginning of the project, the focus will be on the usability of the tools and models developed. This point highlights the fundamentally interdisciplinary aspect of the work carried out here by computational scientists and field linguists. To do so, a recognized field linguist will work full-time on the project, and will participate, both through her experience and expertise in the definition, development and evaluation of the different systems developed in the project.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-12-BS02-0006
    Funder Contribution: 337,555 EUR

    The proposal aims at developing tools for diagnostic, localization, and measurements of automatic transcription errors. This proposal is based on a consortium of academic actors of very first plan in this field. The objective is to study in detail (at the perceptive, acoustico-phonetics, lexical, and syntactic levels) the errors in order to bring a precise diagnosis of possible lacks of the current classical models on certain classes of linguistic phenomena. At the application level, the proposal is justified by an observation: a high number of applications in the field of content access from multimedia data are made possible by the use of automatic transcriptions of speech: subtitling of video emissions, search for precise extracts in audio-visual archives, automated reports of meetings, extraction of information and structuring of information (Speech Analytics) in contents multimedia (Web, call centers, ...). However their deployment on a large scale is often slowed down by the fact that transcription from automatic speech recognition systems contains too many errors. Research and development in speech recognition related, successfully until now, to the improvement of methods and models implemented in the process of transcription, measured thanks to the word error rate; however, last a certain performance level, the marginal cost induced to reduce the residual errors increases then exponentially. Transcription errors thus persist, which are more or less awkward according to the applications. Information retrieval is tolerant with errors (up to 30%), but systematic errors on certain named entities can be prohibitive. On the contrary, subtitling or meeting transcription have a very weak tolerance with the errors, and even very low word error rates compared to the state of the art (lower than 5%) are too high for the end-users. Error processing is not limited to increase the acceptability of the applications based on the automatic transcription of the word. Error classification, impact measurement by perceptive tests, error diagnosis state-of-the-art systems, are the first crucial stage in order to identify the lacks of the current models and to prepare the future Automatic Speech Recognition system generations. The proposal aims, by a close cooperation between complementary partners who excel in their field, to set up an infrastructure of detection, diagnosis, and qualitative measurement which makes it possible to create a virtuous circle of improvement of large and very large vocabulary continuous speech recognition systems.

    more_vert
  • chevron_left
  • 1
  • 2
  • 3
  • 4
  • chevron_right

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

Content report
No reports available
Funder report
No option selected
arrow_drop_down

Do you wish to download a CSV file? Note that this process may take a while.

There was an error in csv downloading. Please try again later.