
arXiv: 2312.07671
Robots' acceptability among humans and their sociability can be significantly enhanced by incorporating human-like reactions. Humans can react to environmental events very quickly and without thinking. An instance where humans show natural reactions is when they encounter a sudden and loud sound that startles or frightens them. During such moments, individuals may instinctively move their hands, turn toward the origin of the sound, and try to determine the event's cause. This inherent behavior motivated us to explore this less-studied part of social robotics. In this work, a multi-modal system composed of an action generator, sound classifier, and YOLO object detector was designed to sense the environment and, in the presence of sudden loud sounds, show natural human fear reactions; and finally, locate the fear-causing sound source in the environment. These valid generated motions and inferences could imitate intrinsic human reactions and enhance the sociability of robots. For motion generation, a model based on LSTM and MDN networks was proposed to synthesize various motions. Also, in the case of sound detection, a transfer learning model was preferred that used the spectrogram of the sound signals as its input. After developing individual models for sound detection, motion generation, and image recognition, they were integrated into a comprehensive "fear" module implemented on the NAO robot. Finally, the fear module was tested in practical application and two groups of experts and non-experts (in the robotics area) filled out a questionnaire to evaluate the performance of the robot. We indicated that the proposed module could convince the participants that the Nao robot acts and reasons like a human when a sudden and loud sound is in the robot's peripheral environment, and additionally showed that non-experts have higher expectations about social robots and their performance.
16 pages, 11 figures
motion generation, FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Computer Science - Artificial Intelligence, 68T40, Image and Video Processing (eess.IV), deep learning, social robot, Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Sound, TK1-9971, Machine Learning (cs.LG), Emotion generation, Computer Science - Robotics, Artificial Intelligence (cs.AI), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, human–robot interaction, Electrical engineering. Electronics. Nuclear engineering, Robotics (cs.RO), Electrical Engineering and Systems Science - Audio and Speech Processing
motion generation, FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Computer Science - Artificial Intelligence, 68T40, Image and Video Processing (eess.IV), deep learning, social robot, Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Sound, TK1-9971, Machine Learning (cs.LG), Emotion generation, Computer Science - Robotics, Artificial Intelligence (cs.AI), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, human–robot interaction, Electrical engineering. Electronics. Nuclear engineering, Robotics (cs.RO), Electrical Engineering and Systems Science - Audio and Speech Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
