
Infant engagement during guided play is a reliable indicator of early learning outcomes, psychiatric issues and familial wellbeing. An obstacle to using such information in real-world scenarios is the need for a domain expert to assess the data. We show that an end-to-end Deep Learning approach can perform well in automatic infant engagement detection from a single video source, without requiring a clear view of the face or the whole body. To tackle the problem of explainability in learning methods, we evaluate how four common attention mapping techniques can be used to perform subjective evaluation of the network’s decision process and identify multimodal cues used by the network to discriminate engagement levels. We further propose a quantitative comparison approach, by collecting a human attention baseline and evaluating its similarity to each technique.
Computer graphics and computer vision, explainable artifcial intelligence, Datorseende och robotik (autonoma system), video analysis, social signals, end-to-end learning, Datorgrafik och datorseende, infant engagement, Computer Vision and Robotics (Autonomous Systems), multimodal cues
Computer graphics and computer vision, explainable artifcial intelligence, Datorseende och robotik (autonoma system), video analysis, social signals, end-to-end learning, Datorgrafik och datorseende, infant engagement, Computer Vision and Robotics (Autonomous Systems), multimodal cues
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 5 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
