
Ce travail présente un algorithme génétique hybride pour résoudre le problème suivant : sélectionner dans une base d'exemples le sous ensemble qui présente la meilleure performance en classification avec le plus petit nombre d'attributs pour le plus grand nombre d'exemples. La méthode traite dans le même mouvement, le même problème d'optimisation, l'édition des prototypes et la sélection des attributs. L'algorithme est divisé en phases au cours desquelles seul l'algorithme génétique est appliqué puis est combiné avec des procédures d'optimisation locales. La transition entre ces phases est automatique. Différents mécanismes sont proposés pour améliorer le compromis entre diversité et élitisme au sein de la population des chromosomes. La méthode est ensuite appliquée à différents jeux de données et les résultats sont comparés avec ceux des principaux algorithmes concurrents. / In this paper, a hybrid genetic approach is proposed to solve the problem of designing a subdatabase of the original one with the highest classification performances, the lowest number of features and the highest number of patterns. The method can simultaneously treat the double problem of editing instance patterns and selecting features as a single optimization problem, and therefore aims at providing a better level of information. The search is optimized by dividing the algorithm into self-controlled phases managed by a combination of pure genetic process and dedicated local approaches. Different heuristics such as an adapted chromosome structure and evolutionary memory are introduced to promote diversity and elitism in the genetic population. They particularly facilitate the resolution of real applications in the chemometric field presenting databases with large feature sizes and medium cardinalities. The study focuses on the double objective of enhancing the reliability of results while reducing the time consumed by combining genetic exploration and a local approach in such a way that excessive computational CPU costs are avoided. The usefulness of the method is demonstrated with artificial and real data and its performance is compared to other approaches.
FEATURE SELECTION, [SDE] Environmental Sciences, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], 006, GENETIC ALGORITHM, [INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS], METHODE, CLASSIFICATION, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [MATH.MATH-KT] Mathematics [math]/K-Theory and Homology [math.KT], HEURISTICS, [INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF], ALGORITHME, [INFO.INFO-PF] Computer Science [cs]/Performance [cs.PF], [SDE]Environmental Sciences, [MATH.MATH-KT]Mathematics [math]/K-Theory and Homology [math.KT], K-NEAREST NEIGHBOR, METHOD
FEATURE SELECTION, [SDE] Environmental Sciences, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], 006, GENETIC ALGORITHM, [INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS], METHODE, CLASSIFICATION, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [MATH.MATH-KT] Mathematics [math]/K-Theory and Homology [math.KT], HEURISTICS, [INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF], ALGORITHME, [INFO.INFO-PF] Computer Science [cs]/Performance [cs.PF], [SDE]Environmental Sciences, [MATH.MATH-KT]Mathematics [math]/K-Theory and Homology [math.KT], K-NEAREST NEIGHBOR, METHOD
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 19 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
