
We present a new distributed computing algorithm, Parallel Pattern Discovery (PPD), for constrained Non-negative Matrix Factorization (NMF). Our implementation offers the ability to constrain a specific pattern for optimization of the data while minimizing reconstruction error. Parallel Pattern Discovery operates within a distributed environment using a message passing interface. Distribution of the PPD algorithm provides better scalability and allows operation in single- or multiple-system environments. The algorithm was tested on a set of time-series, dose-dependent mRNA gene expression data. Parallel Pattern Discovery was found to accurately identify patterns within the data and reconstruct the original matrices. Our NMF algorithm found a smaller reconstruction error when compared against standard NMF algorithms. Development focused on running PPD as part of a system which identifies significantly contributing genes. Parallel Pattern Discovery is first run to find patterns from biological data. It is followed by Gene Set Enrichment (GSE) which takes the pattern data and relates it back to genetic pathways.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
