
This project will provide scientific advancements for benchmarking, object recognition, manipulation and human-robot interaction. We focus on sorting a complex, unstructured heap of unknown objects --resembling nuclear waste consisting of a set of broken deformed bodies-- as an instance of an extremely complex manipulation task. The consortium aims at building an end-to-end benchmarking framework, which includes rigorous scientific methodology and experimental tools for application in realistic scenarios. Benchmark scenarios will be developed with off-the-shelf manipulators and grippers, allowing to create an affordable setup that can be easily reproduced both physically and in simulation. We will develop benchmark scenarios with varying complexities, i.e., grasping and pushing irregular objects, grasping selected objects from the heap, identifying all object instances and sorting the objects by placing them into corresponding bins. We will provide scanned CAD models of the objects that can be used for 3D printing in order to recreate our benchmark scenarios. Benchmarks with existing grasp planners and manipulation algorithms will be implemented as baseline controllers that are easily exchangeable using ROS. The ability of robots to fully autonomously handle dense clutters or a heap of unknown objects has been very limited due to challenges in scene understanding, grasping, and decision making. Instead, we will rely on semi-autonomous approaches where a human operator can interact with the system (e.g. using tele-operation but not only) and giving high-level commands to complement the autonomous skill execution. The amount of autonomy of our system will be adapted to the complexity of the situation. We will also benchmark our semi-autonomous task execution with different human operators and quantify the gap to the current SOTA in autonomous manipulation. Building on our semi-autonomous control framework, we will develop a manipulation skill learning system that learns from demonstrations and corrections of the human operator and can therefore learn complex manipulations in a data-efficient manner. To improve object recognition and segmentation in cluttered heaps, we will develop new perception algorithms and investigate interactive perception in order to improve the robot's understanding of the scene in terms of object instances, categories and properties. \\
The call for projects REPERE is an evaluation campaign of people recognition technologies for audio-visual French TV shows. The competitors will have to propose systems based on various information sources present in the shows to determine, which will be used to determine who appears in the images, who speaks, which people names are pronounced or appear on the screen, and to which people those names correspond. Addressing all these points will require a mix of various skills: in speaker and face recognition (detection, segmentation, regrouping) to determine people biometric characteristics (voice, face); in speech recognition; in character recognition; and in natural language processing, in order to extract people names and to correctly associate them with people. There are two laboratories involved in this proposal: the Laboratoire d’Informatique de l’Universtité de Maine (LIUM), which will be the coordinator, and the Swiss research institute IDIAP. The competences of the partners make it possible to cover all the topics of the challenge. LIUM has been developing since 2004 a powerful speaker diarization system (2nd at the ESTER 2005 evaluation campaign, 1st in ESTER 2 campaign in 2008). LIUM has also been working since 2006 on speaker identification using speech transcription to extract the names from the recordings. On the other side, IDIAP has developed since many years competences in automatic processing of audio and video data. Within this project, IDIAP will focus mainly on people detection and recognition, as well as on character recognition (OCR). IDIAP regularly took part in NIST evaluation campaigns. It organized and participated in one of the tasks of the CLEAR (Classification of Events, Activities, and Relationships) evaluation in 2006 and 2007, and is currently in charge of a face and speaker recognition evaluation campaign in the international conference ICPR 2010. The project will benefit from the results previously obtained by the partners, but it will require integration efforts in order to build a complete system to follow the main speakers throughout a show and to name them. Research will focus on the combination of information coming from the various sources (acoustic signal, images, words and text). In addition, an original aspect of this project will be the recognition of people roles (presenter, journalist or regulator, guest), of their relations and their interactions (for example, who talks to whom), as well as the exploitation of these data in order to improve the performance of speaker/face diarization. For example, the interactions will help linking the speaker to the people on screen; while the roles will be able to facilitate speaker identification by allowing to target regions which contain more useful information.
Standard machine learning systems require massive data and huge processing infrastructures, but the main limitation to their spreading comes from the need of the empirical and rare knowledge of an experienced data scientist able to set and adjust their behavior over time. The ALLIES project will lay the foundation for development of autonomous intelligent systems sustaining their performance across time. Such unsupervised system will be able to auto-update and perform self-evaluation to be aware of the evolution of its own knowledge acquisition. It should adapt to a changing environment by following a given learning scenario that balances the importance of performance on past and present data to avoid unwanted regression. Such systems could not be developed without adapted metrics and protocols enabling their objective and reproducible evaluation. This evaluation should continuously assess the performance on the given task and quantify the effort required to reach it in terms of unsupervised data collected by the system and of interaction with humans in the case of active-learning. The ALLIES project will develop, evaluate and disseminate those metrics and protocols. They will be available to european actors via an open evaluation platform dedicated to reproducible research. An evaluation campaign and a workshop will be organised to engage the community on this path. By publicly releasing the evaluation protocols and data, by releasing a dedicated evaluation platform and by developing autonomous systems for two tasks: machine translation and speaker diarization, we believe that the ALLIES project will boost the development of intelligent lifelong learning systems in Europe.
Speaker diarization is an unsupervised process that aims at identifying each speaker within an audio stream and determining when each speaker is active. It considers that the number of speakers, their identities and their speech turns are all unknown. Speaker diarization has become an important key technology in many domains such as content-based information retrieval, voice biometrics, forensics or social-behavioural analysis. Examples of applications of speaker diarization include speech and speaker indexing, speaker recognition (in the presence of multiple speakers), speaker role detection, speech-to-text transcription, speech-to-speech translation and document content structuring. Although speaker diarization has been studied for almost two decades, current state-of-the-art systems suffer from many limitations. Such systems are extremely domain-dependent: for instance, a speaker diarization system trained on radio/TV broadcast news experiences drastically degraded performance when tested on a different type of recordings such as radio/TV debates, meetings, lectures, conversational telephone speech or conversational voice-over-ip speech. Overlap speech, spontaneous speaking style, background noise, music and other non-speech sources (laugh, applause, etc.) are all nuisance factors that badly affect the quality of speaker diarization. Furthermore, most existing work addresses the problem of offline speaker diarization, that is, the system has full access to the entire audio recording beforehand and no real time processing is required. Therefore, the multi-pass processing over the same data is feasible and a bunch of elegant machine learning tools can be used. Nevertheless, these compromises are not admissible in real-time applications mainly when it comes to public security and fight against terrorism and cyber-criminality. Moreover, after an initial step of segmentation into speech turns, most approaches address speaker diarization as a bag-of-speech-turns clustering problem and do not take into account the inherent temporal structure of interactions between speakers. One goal of the project is to integrate this information and rely on structured prediction techniques to improve over standard hierarchical clustering methods. Since our main application is related to the fight against cyber-criminality and public security, designing an online speaker diarization system is necessary. Therefore, the focus on industrial research will be supplemented by addressing more fundamental research issues related to structured prediction and methods such as conditional random fields. Speaker diarization is inherently related to speaker recognition. In the recent years, state-of-the-art speaker recognition systems have shown good improvement, thanks to the emergence of new recognition paradigms such as i-vectors and deep learning, new session compensation techniques such as probabilistic linear discriminant analysis, and new score normalization techniques such as adaptive symmetric score normalization. However, existing speaker diarization systems did not take full advantages of those new techniques. Therefore, one goal of the project is to adapt those techniques for speaker diarization, and thus fill the research gap in the current literature. To evaluate the proposed algorithms and to ensure their genericness, different existing databases will be considered such as NIST SRE 2008 summed-channel telephone data, NIST RT 2003-2004 conversational telephone data, REPERE TV broadcast data and AMI meeting corpus. Furthermore, we are aiming to collect a medium-size database that suits our main application of fight against cyber-criminality.
Heart pumping and shaping take place concomitantly during embryonic development. These two processes require a tight and dynamic coordination between mechanical forces and tissue morphogenesis. Importantly, in the adult, a cardiac disease or myocardial infarction can, sometimes very abruptly, alter cardiac contractily and cell composition. How mechanical parameters such as hemodynamic and tensile forces impact cardiac cells during disease is currently intensively investigated. While the mammalian heart undergoes primarily fibrotic repair, animals such as the zebrafish regenerate their heart completely upon different types of injury. The recently accepted fact that cardiomyocytes can also proliferate in adult mammals, including humans, even if only to a limited extent, makes it particularly important to understand how mechanical forces infer cardiac regeneration in species with high regenerative capacity. To reach a comprehensive description of the organizational properties of different cardiac cell types during development and regeneration in a contracting organ such as the heart, we aim to study valve and epicardial formation based on a quantitative analysis of the biological and physical parameters operating at key steps of cardiogenesis. Our multidisciplinary approach will require expertise in biology, biomechanics, optics, and signal processing. We propose to develop high-resolution time- lapse imaging and optical approaches to collect biomechanical properties of the molecular, cellular and tissue dynamics underlying embryonic and adult heart development. This will be performed by visualizing embryonic and adult zebrafish hearts, with a specific focus on epicardium and valve formation. For the first time, we will assemble data at various scales to build a quantitative and quantitative analysis of the morphogenetic processes of heart valve and epicardium development at the cellular scale. This analysis will also enable us to test the biomechanical properties of the regenerating valve by comparison to the normal valve. A significant advantage of our live imaging analysis will be that it will allow for the quantitative features observed in vivo to be challenged using in silico heart models that we will develop in order to test the quality of heart function before and after regeneration.