descriptionPublicationkeyboard_double_arrow_right Article 01 Sep 2020Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 42, pages 2,240-2,256 (issn: 0162-8828, eissn: 1939-3539,

Authors: Kui Yu; Lin Liu; Jiuyong Li; Wei Ding; Thuc Duy Le;

doi: 10.1109/tpami.2019.2908373

pmid: 30946660

handle: 11541.2/143244

Multi-Source Causal Feature Selection

- Summary
- Subjects
- Metrics

Abstract

Causal feature selection has attracted much attention in recent years, as the causal features selected imply the causal mechanism related to the class attribute, leading to more reliable prediction models built using them. Currently there is a need of developing multi-source feature selection methods, since in many applications data for studying the same problem has been collected from various sources, such as multiple gene expression datasets obtained from different experiments for studying the causes of the same disease. However, the state-of-the-art causal feature selection methods generally tackle a single dataset, and a direct application of the methods to multiple datasets will result in unreliable results as the datasets may have different distributions. To address the challenges, by utilizing the concept of causal invariance in causal inference, we first formulate the problem of causal feature selection with multiple datasets as a search problem for an invariant set across the datasets, then give the upper and lower bounds of the invariant set, and finally we propose a new Multi-source Causal Feature Selection algorithm, MCFS. Using synthetic and real world datasets and 16 feature selection methods, the extensive experiments have validated the effectiveness of MCFS.

Related Organizations

University of South Australia
Australia
Hefei University of Technology
China (People's Republic of)
University of Massachusetts System
United States
University of Massachusetts Boston
United States

Keywords

Markov blanket, Bayesian network, multiple datasets, causal feature selection, causal invariance

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	74
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%

Found an issue? Give us feedback

Top 1%

Top 10%

Top 1%

bronze

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Funded by

ARC| Discovery Projects - Grant ID: DP170101306