Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data

Name: Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data
Keywords: FOS: Computer and information sciences, Convex programming, convex optimization, Classification and discrimination; cluster analysis (statistical aspects), Machine Learning (stat.ML), Bregman divergences, Integrative clustering, sparse clustering, Methodology (stat.ME), Statistical aspects of big data and data science

Wang, Minjie; Allen, Genevera I.

Found an issue? Give us feedback

Journal of Machine L...arrow_drop_down

Journal of Machine Learning Research

Article

Data sources: Europe PubMed Central

arXiv.org e-Print Archive

Preprint . 2019

Data sources: arXiv.org e-Print Archive

zbMATH Open

Article . 2021

Data sources: zbMATH Open

https://dx.doi.org/10.48550/ar...

Article . 2019

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Article

Data sources: DBLP

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data

Integrative generalized convex clustering optimization and feature selection for mixed multi-view data

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2019Embargo end date: 01 Jan 2019Publisher:arXivJournal:Journal of machine learning research : JMLR, volume 22 (issn: 1532-4435,

Copyright policy )

Authors: Wang, Minjie; Allen, Genevera I.;

doi: 10.48550/arxiv.1912.05449

pmid: 34744522

pmc: PMC8570363

arXiv: 1912.05449

handle: 1911/110650 , 1911/111581

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

In mixed multi-view data, multiple sets of diverse features are measured on the same set of samples. By integrating all available data sources, we seek to discover common group structure among the samples that may be hidden in individualistic cluster analyses of a single data-view. While several techniques for such integrative clustering have been explored, we propose and develop a convex formalization that will inherit the strong statistical, mathematical and empirical properties of increasingly popular convex clustering methods. Specifically, our Integrative Generalized Convex Clustering Optimization (iGecco) method employs different convex distances, losses, or divergences for each of the different data views with a joint convex fusion penalty that leads to common groups. Additionally, integrating mixed multi-view data is often challenging when each data source is high-dimensional. To perform feature selection in such scenarios, we develop an adaptive shifted group-lasso penalty that selects features by shrinking them towards their loss-specific centers. Our so-called iGecco+ approach selects features from each data-view that are best for determining the groups, often leading to improved integrative clustering. To fit our model, we develop a new type of generalized multi-block ADMM algorithm using sub-problem approximations that more efficiently fits our model for big data sets. Through a series of numerical experiments and real data examples on text mining and genomics, we show that iGecco+ achieves superior empirical performance for high-dimensional mixed multi-view data.

Related Organizations

Rice University
United States
Baylor College of Medicine
United States

Keywords

FOS: Computer and information sciences, Convex programming, convex optimization, Classification and discrimination; cluster analysis (statistical aspects), Machine Learning (stat.ML), Bregman divergences, Integrative clustering, sparse clustering, Methodology (stat.ME), Statistical aspects of big data and data science, feature selection, Statistics - Machine Learning, integrative clustering, convex clustering, GLM deviance, Statistics - Methodology, clustering

1 Research products, page 1 of 1

Multi-view data integration by linear and non-linear dimensionality reduction
2021IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Average

Green

gold

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data

1 Research products, page 1 of 1

Multi-view data integration by linear and non-linear dimensionality reduction