SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Article , Preprint , Conference object 01 Jan 2022Embargo end date: 01 Jan 2021 English Publisher:Springer Nature Switzerland

Authors: Ailing Zeng; Lei Yang; Xuan Ju; Jiefeng Li; Jianyi Wang; Qiang Xu 0001;

doi: 10.1007/978-3-031-20065-6_36 , 10.48550/arxiv.2112.13715

arXiv: 2112.13715

SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos

- Summary
- Subjects
- Related research
  (2)
- External Databases
  (2)
- Metrics

Abstract

When analyzing human motion videos, the output jitters from existing pose estimators are highly-unbalanced with varied estimation errors across frames. Most frames in a video are relatively easy to estimate and only suffer from slight jitters. In contrast, for rarely seen or occluded actions, the estimated positions of multiple joints largely deviate from the ground truth values for a consecutive sequence of frames, rendering significant jitters on them. To tackle this problem, we propose to attach a dedicated temporal-only refinement network to existing pose estimators for jitter mitigation, named SmoothNet. Unlike existing learning-based solutions that employ spatio-temporal models to co-optimize per-frame precision and temporal smoothness at all the joints, SmoothNet models the natural smoothness characteristics in body movements by learning the long-range temporal relations of every joint without considering the noisy correlations among joints. With a simple yet effective motion-aware fully-connected network, SmoothNet improves the temporal smoothness of existing pose estimators significantly and enhances the estimation accuracy of those challenging frames as a side-effect. Moreover, as a temporal-only model, a unique advantage of SmoothNet is its strong transferability across various types of estimators and datasets. Comprehensive experiments on five datasets with eleven popular backbone networks across 2D and 3D pose estimation and body recovery tasks demonstrate the efficacy of the proposed solution. Code is available at https://github.com/cure-lab/SmoothNet.

Accepted by ECCV 2022

Related Organizations

Chinese University of Hong Kong
China (People's Republic of)
Shanghai Jiao Tong University
Sensetime (China)
China (People's Republic of)
The Chinese University of Hong Kong
Hong Kong
Nanyang Technological University

View all View all

Keywords

FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

2 Research products, page 1 of 1

Temporal cues for consonant recognition: Training, talker generalization, and use in evaluation of cochlear implants
1992IsAmongTopNSimilarDocuments
VIBE software on GitHub
IsRelatedTo

3dhp

3dpw

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	56
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%