
doi: 10.15488/19215
Tabular data is essential in data science and machine learning, supporting a wide range of real-world applications across various industries, such as finance, healthcare, marketing, and customer segmentation. Traditionally, classification tasks in tabular data involve assigning a single label to each instance, which works well for binary or multiclass scenarios. However, as data becomes more complex, there is a growing need for methods that can handle instances with multiple characteristics, known as Multi-label Classification ( MLC ). MLC is crucial when instances may belong to multiple categories, like movies fitting multiple genres. This complexity challenges traditional single-label methods, which often struggle to capture diverse characteristics. MLC improves recommendation systems by aligning content with users’ varied preferences, thereby enhancing user experience and engagement. Conventional classification methods often fall short in dealing with the complexity of multi-label data, leading to misclassification and poor decision-making. MLC , however, addresses this by accurately assigning multiple relevant labels to instances using advanced techniques like ensemble methods, deep learning, and feature engineering. Despite progress, MLC still faces challenges in capturing label correlations and managing model complexity, especially with imbalanced labels and scalability. Existing methods aim to solve these problems have limitations in efficiency and scalability. This thesis focuses on modeling tabular multi-label data to achieve high performance and scalability. We address this challenge by introducing an innovative single-model solution for multi-label classification called SiCMuL. This approach defines a unique context for each label, allowing all labels to be trained within their respective contexts using a single model. Also, We introduce TAMUL, a unified model for multi-label classification that uses transformers to handle label correlations dynamically. The thesis also explores scenarios involving partially observed true labels and streaming data. We propose an adaptive model for the Partially Multi-Label Learning ( PML ) task, named PML-CC. PML-CC treats each label individually, progressively extracting high-confidence instances and relevant features for each label to enhance its performance. Finally, we propose an innovative method to deal with imbalance data for the multi-label stream scenario. By conducting a comprehensive literature review and clearly defining the problems, this thesis advances the field of tabular multi-label classification across diverse environments.
Tabular data, Multi-label, Classification, Multi-label, 000 | Computer science, knowledge, systems::004 | Data processing and computer science, Classification
Tabular data, Multi-label, Classification, Multi-label, 000 | Computer science, knowledge, systems::004 | Data processing and computer science, Classification
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
