Classification Problem in Imbalanced Datasets

La classification est une tâche d'exploration de données. Elle vise à extraire des connaissances à partir de grands ensembles de données. Il existe deux types de classification. La première est connue sous le nom de classification complète, et elle est appliquée à des ensembles de données équilibrés. Cependant, lorsqu'elle est appliquée à des ensembles de données déséquilibrés, elle est appelée classification partielle ou problème de classification dans des ensembles de données déséquilibrés, ce qui est un problème fondamental en apprentissage automatique, et elle a reçu beaucoup d'attention. Compte tenu de l'importance de cette question, une grande quantité de techniques ont été proposées pour tenter de résoudre ce problème. Ces propositions peuvent être divisées en trois niveaux : le niveau de l'algorithme, le niveau des données et le niveau hybride. Dans ce chapitre, nous présenterons le problème de la classification dans des ensembles de données déséquilibrés, ses domaines d'application, ses mesures appropriées des performances, ainsi que ses approches et techniques.

La clasificación es una tarea de minería de datos. Su objetivo es extraer conocimiento de grandes conjuntos de datos. Hay dos tipos de clasificación. La primera se conoce como clasificación completa y se aplica a conjuntos de datos equilibrados. Sin embargo, cuando se aplica a los desequilibrados, se llama clasificación parcial o un problema de clasificación en conjuntos de datos desequilibrados, que es un problema fundamental en el aprendizaje automático, y ha recibido mucha atención. Teniendo en cuenta la importancia de este problema, se ha propuesto una gran cantidad de técnicas para tratar de abordar este problema. Estas propuestas se pueden dividir en tres niveles: el nivel de algoritmo, el nivel de datos y el nivel híbrido. En este capítulo, presentaremos el problema de clasificación en conjuntos de datos desequilibrados, sus dominios de aplicación, sus medidas apropiadas de rendimiento y sus enfoques y técnicas.

Classification is a data mining task.It aims to extract knowledge from large datasets.There are two kinds of classification.The first one is known as complete classification, and it is applied to balanced datasets.However, when it is applied to imbalanced ones, it is called partial classification or a problem of classification in imbalanced datasets, which is a fundamental problem in machine learning, and it has received much attention.Considering the importance of this issue, a large amount of techniques have been proposed trying to address this problem.These proposals can be divided into three levels: the algorithm level, the data level, and the hybrid level.In this chapter, we will present the classification problem in imbalanced datasets, its domains of application, its appropriate measures of performances, and its approaches and techniques.

التصنيف هو مهمة تنقيب عن البيانات. يهدف إلى استخراج المعرفة من مجموعات البيانات الكبيرة. هناك نوعان من التصنيف. يُعرف الأول بالتصنيف الكامل، ويتم تطبيقه على مجموعات البيانات المتوازنة. ومع ذلك، عندما يتم تطبيقه على مجموعات البيانات غير المتوازنة، يطلق عليه التصنيف الجزئي أو مشكلة التصنيف في مجموعات البيانات غير المتوازنة، والتي تعد مشكلة أساسية في التعلم الآلي، وقد حظيت باهتمام كبير. بالنظر إلى أهمية هذه المشكلة، تم اقتراح كمية كبيرة من التقنيات في محاولة لمعالجة هذه المشكلة. يمكن تقسيم هذه المقترحات إلى ثلاثة مستويات: مستوى الخوارزمية، ومستوى البيانات، والمستوى الهجين. في هذا الفصل، سنقدم مشكلة التصنيف في مجموعات البيانات غير المتوازنة، ومجالات تطبيقها، ومقاييس أدائها المناسبة، ومناهجها وتقنياتها.

Related Organizations

University of Sciences and Technology Houari Boumediene
Algeria

Keywords

Artificial intelligence, Imbalanced Data, Handling Imbalanced Data in Classification Problems, Instance Selection, Classification, Pattern recognition (psychology), Computer science, Learning with Noisy Labels in Machine Learning, Machine Learning Algorithms, Artificial Intelligence, Hierarchical Classification, Computer Science, Physical Sciences, Multi-label Text Classification in Machine Learning, Machine learning

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	13
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

13

Top 10%

Average

Top 10%

Green

hybrid