Name: Towards Skeletal and Signer Noise Reduction in Sign Language Production via Quaternion-Based Pose Encoding and Contrastive Learning
Keywords: Machine Learning, FOS: Computer and information sciences, Deep Learning, Contrastive Learning, [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], Sign Language Production, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], Computation and Language, Pose Encoding, Computation and Language (cs.CL)

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 16 Sep 2025Embargo end date: 01 Jan 2025Publisher:ACMJournal:Adjunct Proceedings of the 25th ACM International Conference on Intelligent Virtual Agents

Authors: Guilhem Fauré; Mostafa Sadeghi; Sam Bigeard; Slim Ouni;

doi: 10.1145/3742886.3756728 , 10.48550/arxiv.2508.14574

arXiv: 2508.14574

Towards Skeletal and Signer Noise Reduction in Sign Language Production via Quaternion-Based Pose Encoding and Contrastive Learning

- Summary
- Subjects
- Metrics

Abstract

One of the main challenges in neural sign language production (SLP) lies in the high intra-class variability of signs, arising from signer morphology and stylistic variety in the training data. To improve robustness to such variations, we propose two enhancements to the standard Progressive Transformers (PT) architecture (Saunders et al., 2020). First, we encode poses using bone rotations in quaternion space and train with a geodesic loss to improve the accuracy and clarity of angular joint movements. Second, we introduce a contrastive loss to structure decoder embeddings by semantic similarity, using either gloss overlap or SBERT-based sentence similarity, aiming to filter out anatomical and stylistic features that do not convey relevant semantic information. On the Phoenix14T dataset, the contrastive loss alone yields a 16% improvement in Probability of Correct Keypoint over the PT baseline. When combined with quaternion-based pose encoding, the model achieves a 6% reduction in Mean Bone Angle Error. These results point to the benefit of incorporating skeletal structure modeling and semantically guided contrastive objectives on sign pose representations into the training of Transformer-based SLP models.

Related Organizations

French National Centre for Scientific Research
France
French Institute for Research in Computer Science and Automation
France
Inria Centre at Université de Lorraine
France
University of Lorraine
France

Keywords

Machine Learning, FOS: Computer and information sciences, Deep Learning, Contrastive Learning, [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], Sign Language Production, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], Computation and Language, Pose Encoding, Computation and Language (cs.CL), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Related to Research communities

INRIA