Name: Application of weighted low rank approximations: outlier detection in a data matrix
Keywords: Exploratory analysis, Genotype-by-environment interaction, Q1-390, Research Note, Science (General), QH301-705.5, Atypical elements, R, Medicine, Data preprocessing

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 27 May 2025 English Publisher:Springer Science and Business Media LLCJournal:BMC Research Notes, volume 18 (eissn: 1756-0500,

Authors: Marisol García-Peña; Sergio Arciniegas-Alarcón; Kaye E. Basford;

doi: 10.1186/s13104-025-07284-2

Application of weighted low rank approximations: outlier detection in a data matrix

- Summary
- Subjects
- Metrics

Abstract

OBJECTIVE: A mandatory step in the exploratory analysis of any rectangular database is the identification of possible outliers. The presence of these defines what type of explanatory and/or predictive modeling should be used subsequently. This paper presents strategies to identify outliers in any data set using weighted approximations of a matrix. The strategies are evaluated through artificial contamination in sixteen real data sets, of which two have multivariate characteristics and fourteen come from multi-environment trials. As an evaluation criterion, a statistic is proposed such that its value is small when the detection method is good and it is large when false positives or false negatives appear. RESULTS: Six criteria for identifying outliers from weighted approximations were considered, including simple residuals, squared residuals with differential weights, Jackknife and their corresponding iterative versions, and they were compared with the gold standard one based on limits from a bias-adjusted boxplot. All methods are applicable to any numerical data set written in matrix form, e.g. experiments with genotype-by-environment interaction. It was found that in the presence of random outliers in a matrix with numerical entries, the identification of outliers using weighted approximations is more effective than detection based on limits from a bias-adjusted boxplot.

Related Organizations

University of Queensland
Australia
Terrestrial Ecosystem Research Network
Australia
Pontificia Universidad Javeriana
Colombia
University of La Sabana
Colombia
THE UNIVERSITY OF QUEENSLAND
Australia

View all View all

Keywords

Exploratory analysis, Genotype-by-environment interaction, Q1-390, Research Note, Science (General), QH301-705.5, Atypical elements, R, Medicine, Data preprocessing, Biology (General), Criss-cross regression

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

gold