Feature selection using Joint Mutual Information Maximisation

Article English OPEN
Bennasar, Mohamed ; Hicks, Yulia Alexandrovna ; Setchi, Rossitza M. (2015)
  • Publisher: Elsevier
  • Journal: Expert Systems with Applications, volume 42, issue 22, pages 8,520-8,532 (issn: 0957-4174)
  • Related identifiers: doi: 10.1016/j.eswa.2015.07.007
  • Subject: Engineering(all) | Computer Science Applications | T1 | Artificial Intelligence

Feature selection is used in many application areas relevant to expert and intelligent systems, such as data mining and machine learning, image processing, anomaly detection, bioinformatics and natural language processing. Feature selection based on information theory is a popular approach due its computational efficiency, scalability in terms of the dataset dimensionality, and independence from the classifier. Common drawbacks of this approach are the lack of information about the interaction between the features and the classifier, and the selection of redundant and irrelevant features. The latter is due to the limitations of the employed goal functions leading to overestimation of the feature significance.\ud \ud To address this problem, this article introduces two new nonlinear feature selection methods, namely Joint Mutual Information Maximisation (JMIM) and Normalised Joint Mutual Information Maximisation (NJMIM); both these methods use mutual information and the ‘maximum of the minimum’ criterion, which alleviates the problem of overestimation of the feature significance as demonstrated both theoretically and experimentally. The proposed methods are compared using eleven publically available datasets with five competing methods. The results demonstrate that the JMIM method outperforms the other methods on most tested public datasets, reducing the relative average classification error by almost 6% in comparison to the next best performing method. The statistical significance of the results is confirmed by the ANOVA test. Moreover, this method produces the best trade-off between accuracy and stability
  • References (48)
    48 references, page 1 of 5

    Bache, K., & Lichman, M. (2013). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. (http://archive. ics.uci.edu/ml).

    Bajwa, I., Naweed, M., Asif, M., & Hyder, S. (2009). Feature based image classification by using principal component analysis. ICGST International Journal on Graphics Vision and Image Processing, 9, 11-17.

    Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5, 537-550.

    Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and Information Systems, 34, 483-519.

    Brown, G., Pocock, A., Zhao, M., & Lujan, M. (2012). Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. Journal of Machine Learning Research, 13, 27-66.

    Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers and Electrical Engineering, 40, 16-28.

    Cheng, H., Qin, Z., Feng, C., Wang, Y., & Li, F. (2011). Conditional mutual informationbased feature selection analysing for synergy and redundancy. Electronics and Telecommunications Research Institute, 33, 210-218.

    Cover, T., & Thomas, J. (2006). Elements of information theory. New York: John Wiley & Sons.

    Ding, C., & Peng, H. (2003). Minimum redundancy feature selection from microarray gene expression data. In Proceedings of the computational systems bioinformatics: IEEE Computer Society (pp. 523-528).

    Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proceedings of the twelfth international conference on machine learning (pp. 194-202).

  • Metrics
    No metrics available
Share - Bookmark