Multi-view data integration by linear and non-linear dimensionality reduction

descriptionPublicationkeyboard_double_arrow_right Other literature type , Doctoral thesis 01 Jan 2021Embargo end date: 17 May 2022 United Kingdom Publisher:Imperial College London

Authors: Rodosthenous, Theodoulos;

doi: 10.25560/96812

handle: 10044/1/96812

Multi-view data integration by linear and non-linear dimensionality reduction

- Summary
- Subjects
- Related research
  (8)
- Metrics

Abstract

Technological advancements and global data sharing allow for the collection of information from multiple sources on the same samples. Such data are usually referred to as multi-view data, and the dataset from each source as data-view. Statistical properties, such as heterogeneity and noise, make the analysis of multi-view data challenging. An integrative analysis of the different data-views, provides an improved and more accurate understanding of the data. In this thesis, both linear and non-linear solutions are investigated for the visualisation, clustering and classification of multi-view data. In particular, various solutions that perform data integration through dimensionality reduction are explored. Sparse solutions of Canonical Correlation Analysis, a well-known linear integration approach on two data-views, are described, and extensions to the analysis of multiple data-views are proposed. Further, adaptations of non-linear dimensionality reduction (or manifold learning) methods for multi-view data are presented. The proposed algorithms are based on t-distributed Stochastic Neighbour Embedding (t-SNE), Locally Linear Embedding (LLE) and Isometric Feature Mapping (ISOMAP). Manifold learning approach multi-SNE, the multi-view extension based on t-SNE was found to be the best performing solution, providing accurate visualisations of the samples, confirmed both qualitatively and quantitatively. An extension of the algorithm that allows the classification of the samples in a semi-supervised manner is introduced. The uncommon notion of incorporating the response variables as an additional data-view is explored on both linear and non-linear solutions. This thesis ends with the analysis of single-cell multi-omics data, a new and challenging type of biological data. The proposed linear and non-linear integrative algorithms were implemented for the estimation of cell subtypes, cell identification, visualisation and other tasks. This thesis investigates the limitations and strengths of the proposed algorithms through various experiments on numerous real and synthetic multi-view data.

Country

United Kingdom

Related Organizations

Imperial College London
United Kingdom

Keywords

620, 510

8 Research products, page 1 of 1

A Framework for Researching Public Administration Decision Making Processes
2013IsAmongTopNSimilarDocuments
Integrating BIM and GIS for 3D City Modelling : The Case of IFC and CityGML
2010IsAmongTopNSimilarDocuments
Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data
2019IsAmongTopNSimilarDocuments
A 21st Century HPLC Workflow for Process R&D
2005IsAmongTopNSimilarDocuments
The Materials Microcharacterization Collaboratory
1998IsAmongTopNSimilarDocuments
Data geo-Science Approach for Modelling Unconventional Petroleum Ecosystems and their Visual Analytics
2021IsAmongTopNSimilarDocuments
Integrating BIM and GIS for 3D City Modelling : The Case of IFC and CityGML
2010IsAmongTopNSimilarDocuments
CATOS (Computer Aided Training/Observing System): Automating animal observation and training
2016IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green