Hybrid genetic algorithm for dual selection

descriptionPublicationkeyboard_double_arrow_right Article 19 Oct 2007 France English Publisher:Springer Science and Business Media LLCJournal:Pattern Analysis and Applications, volume 11, pages 179-198 (issn: 1433-7541, eissn: 1433-755X,

Copyright policy )

Authors: Ros, F.; Guillaume, Serge; Pintore, M.; Chrétien, J.;

doi: 10.1007/s10044-007-0089-3

Hybrid genetic algorithm for dual selection

- Summary
- Subjects
- Metrics

Abstract

Ce travail présente un algorithme génétique hybride pour résoudre le problème suivant : sélectionner dans une base d'exemples le sous ensemble qui présente la meilleure performance en classification avec le plus petit nombre d'attributs pour le plus grand nombre d'exemples. La méthode traite dans le même mouvement, le même problème d'optimisation, l'édition des prototypes et la sélection des attributs. L'algorithme est divisé en phases au cours desquelles seul l'algorithme génétique est appliqué puis est combiné avec des procédures d'optimisation locales. La transition entre ces phases est automatique. Différents mécanismes sont proposés pour améliorer le compromis entre diversité et élitisme au sein de la population des chromosomes. La méthode est ensuite appliquée à différents jeux de données et les résultats sont comparés avec ceux des principaux algorithmes concurrents. / In this paper, a hybrid genetic approach is proposed to solve the problem of designing a subdatabase of the original one with the highest classification performances, the lowest number of features and the highest number of patterns. The method can simultaneously treat the double problem of editing instance patterns and selecting features as a single optimization problem, and therefore aims at providing a better level of information. The search is optimized by dividing the algorithm into self-controlled phases managed by a combination of pure genetic process and dedicated local approaches. Different heuristics such as an adapted chromosome structure and evolutionary memory are introduced to promote diversity and elitism in the genetic population. They particularly facilitate the resolution of real applications in the chemometric field presenting databases with large feature sizes and medium cardinalities. The study focuses on the double objective of enhancing the reliability of results while reducing the time consumed by combining genetic exploration and a local approach in such a way that excessive computational CPU costs are avoided. The usefulness of the method is demonstrated with artificial and real data and its performance is compared to other approaches.

Country

France

Related Organizations

Gemalto
Israel
National Research Institute for Agriculture, Food and Environment
France
Département Sciences sociales, agriculture et alimentation, espace et environnement
France
Montpellier SupAgro
France

Keywords

FEATURE SELECTION, [SDE] Environmental Sciences, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], 006, GENETIC ALGORITHM, [INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS], METHODE, CLASSIFICATION, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [MATH.MATH-KT] Mathematics [math]/K-Theory and Homology [math.KT], HEURISTICS, [INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF], ALGORITHME, [INFO.INFO-PF] Computer Science [cs]/Performance [cs.PF], [SDE]Environmental Sciences, [MATH.MATH-KT]Mathematics [math]/K-Theory and Homology [math.KT], K-NEAREST NEIGHBOR, METHOD

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	19
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average