
doi: 10.25560/96812
handle: 10044/1/96812
Technological advancements and global data sharing allow for the collection of information from multiple sources on the same samples. Such data are usually referred to as multi-view data, and the dataset from each source as data-view. Statistical properties, such as heterogeneity and noise, make the analysis of multi-view data challenging. An integrative analysis of the different data-views, provides an improved and more accurate understanding of the data. In this thesis, both linear and non-linear solutions are investigated for the visualisation, clustering and classification of multi-view data. In particular, various solutions that perform data integration through dimensionality reduction are explored. Sparse solutions of Canonical Correlation Analysis, a well-known linear integration approach on two data-views, are described, and extensions to the analysis of multiple data-views are proposed. Further, adaptations of non-linear dimensionality reduction (or manifold learning) methods for multi-view data are presented. The proposed algorithms are based on t-distributed Stochastic Neighbour Embedding (t-SNE), Locally Linear Embedding (LLE) and Isometric Feature Mapping (ISOMAP). Manifold learning approach multi-SNE, the multi-view extension based on t-SNE was found to be the best performing solution, providing accurate visualisations of the samples, confirmed both qualitatively and quantitatively. An extension of the algorithm that allows the classification of the samples in a semi-supervised manner is introduced. The uncommon notion of incorporating the response variables as an additional data-view is explored on both linear and non-linear solutions. This thesis ends with the analysis of single-cell multi-omics data, a new and challenging type of biological data. The proposed linear and non-linear integrative algorithms were implemented for the estimation of cell subtypes, cell identification, visualisation and other tasks. This thesis investigates the limitations and strengths of the proposed algorithms through various experiments on numerous real and synthetic multi-view data.
620, 510
620, 510
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
