- University of the Witwatersrand South Africa
- University of Johannesburg South Africa
The identification of a differentially expressed set of genes in microarray data analysis is essential, both for novel onco-genic pathway identification, as well as for automated diagnostic purposes. This paper assesses the effectiveness of the Population-Based Incremental Learning (PBIL) algorithm in identifying a class differentiating gene set for sample classification. PBIL is based on iteratively evolving the genome of a search population by updating a probability vector, guided by the extent of class-separability demonstrated by a combination of features. PBIL is compared, both to standard Genetic Algorithm (GA), as well as to an Analysis of Variance (ANOVA). The algorithms are tested on a publically available three-class leukaemia microarray data set (n=72). After running 30 repeats of both GA and PBIL, PBIL was able to find an average feature-space separability of 97.04%, while GA achieved an average class-separability of 96.39%. PBIL also found smaller feature-spaces than GA, (PBIL — 326 genes and GA — 2652) thus excluding a large percentage of redundant features. It also, on average, outperformed the ANOVA approach for n = 2652 (91.62%), q < 0.05 (94.44%), q < 0.01 (93.06%) and q < 0.005 (95.83%). The best PBIL run (98.61%) even outperformed ANOVA for n = 326 and q < 0.001 (both 97.22%). PBIL's performance is ascribed to its ability to direct the search, not only towards the optimal solution, but also away from the worst.