SMOTE-Text: A Modified SMOTE for Turkish Text Classification

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Article 01 Jan 2021 English Publisher:Springer International Publishing

Authors: Nur Curukoglu; Alper Ozpinar;

doi: 10.1007/978-3-030-79357-9_9

SMOTE-Text: A Modified SMOTE for Turkish Text Classification

- Summary
- Metrics

Abstract

One of the most common problems faced by large enterprise companies is the loss of knowhow after employee’s job replacements and quits. Creating a well-organized, indexed, connected, user friendly and sustainable digital enterprise memory can solve this problem and creates a practical knowhow transfer to new recruited personnel. In this regard, one of the problems that generated is the correct classification of documents that will be stored in the digital library. The most general meaning of text classification also known as text categorization is the process of categorizing text into labeled groups. A document can be related to one or more subjects and choosing the correct labels and classification is sometimes a challenging process. Information repository shows various distributions according to the company’s business areas. For a good and successful machine learning based text classification requires balanced datasets related with the business and previous samples. Due to the lack of documents from minor business creates imbalanced learning dataset. To overcome this problem synthetic data can be created with some methods but those methods are suitable for numerical inputs not proper for text classification. This article presents a modified version of Synthetic Minority Oversampling Technique SMOTE algorithm for text classification by integrating the Turkish dictionary for oversampling for text processing and classification.

Related Organizations

Istanbul Commerce University
Turkey

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now