Name: Multiclass Classification with Imbalanced Datasets for Car Ownership Demand Model – Cost-Sensitive Learning
Creator: Patiphan Kaewwichian
Keywords: FOS: Computer and information sciences, Artificial intelligence, Class (philosophy), Support vector machine, Outcome (game theory), 02 engineering and technology, cross-validation, cost matrix, Automatic License Plate Recognition System, Engineering

التصنيف متعدد الفئات مع مجموعات بيانات غير متوازنة لنموذج الطلب على ملكية السيارات – التعلم الحساس للتكلفة

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 31 May 2021Publisher:Faculty of Transport and Traffic SciencesJournal:Promet - Traffic&Transportation, volume 33, pages 361-371 (issn: 0353-5320, eissn: 1848-4069,

Authors: Patiphan Kaewwichian;

doi: 10.7307/ptt.v33i3.3728 , 10.60692/c1aa6-8hq27 , 10.60692/sx8qp-02g89

Multiclass Classification with Imbalanced Datasets for Car Ownership Demand Model – Cost-Sensitive Learning

- Summary
- Subjects
- Metrics

Abstract

In terms of the travel demand prediction from the household car ownership model, if the imbalanced data were used to support the transportation policy via a machine learning model, it would negatively affect the algorithm training process. The data on household car ownership obtained from the study project for the expressway preparation in the Khon Kaen Province (2015) was an unbalanced dataset. In other words, the number of members of the minority class is lower than the rest of the answer classes. The result is a bias in data classification. Consequently, this research suggested balancing the datasets with cost-sensitive learning methods, including decision trees, k-nearest neighbors (kNN), and naive Bayes algorithms. Before creating the 3-class model, a k-folds cross-validation method was applied to classify the datasets to define true positive rate (TPR) for the model’s performance validation. The outcome indicated that the kNN algorithm demonstrated the best performance for the minority class data prediction compared to other algorithms. It provides TPR for rural and suburban area types, which are region types with very different imbalance ratios, before balancing the data of 46.9% and 46.4%. After balancing the data (MCN1), TPR values were 84.4% and 81.4%, respectively.

Related Organizations

Keywords

FOS: Computer and information sciences, Artificial intelligence, Class (philosophy), Support vector machine, Outcome (game theory), cross-validation, cost matrix, Automatic License Plate Recognition System, Engineering, Machine learning, Media Technology, FOS: Mathematics, Data mining, TA1001-1280, decision trees, Naive Bayes classifier, Mathematical economics, tour-based model, Traffic Flow Prediction and Forecasting, Building and Construction, Predictive Modeling, Computer science, Transportation engineering, k-nearest neighbors (kNN), k-nearest neighbors (knn), Computer Science, Physical Sciences, Data Mining in Various Applications, Short-Term Forecasting, Mathematics, Information Systems

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	6
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%