
Ð’ работе проведено Ñравнение ÑффективноÑти алгоритмов одноклаÑÑовой клаÑÑификации и PU-Ð¾Ð±ÑƒÑ‡ÐµÐ½Ð¸Ñ (англ. Positive Unlabeled learning) на различных типах данных и при различных подходах к разыменованию иÑходных клаÑÑов. Был выполнен обзор текущего ÑоÑтоÑÐ½Ð¸Ñ Ð¸ÑÑледований в Ñтой облаÑти и анализ Ñравниваемых методов. Предложены Ñ€ÐµÑˆÐµÐ½Ð¸Ñ Ð´Ð»Ñ Ñ‚ÐµÐºÑƒÑ‰Ð¸Ñ… проблем ÑÑ€Ð°Ð²Ð½ÐµÐ½Ð¸Ñ Ð°Ð»Ð³Ð¾Ñ€Ð¸Ñ‚Ð¼Ð¾Ð². Проанализированы и выбраны ÑпецифичеÑкие метрики Ð´Ð»Ñ ÑкÑпериментального ÑравнениÑ. Ð”Ð»Ñ Ð´Ð°Ð»ÑŒÐ½ÐµÐ¹ÑˆÐµÐ³Ð¾ иÑÑÐ»ÐµÐ´Ð¾Ð²Ð°Ð½Ð¸Ñ Ñ€ÐµÐ°Ð»Ð¸Ð·Ð¾Ð²Ð°Ð½ алгоритм PU-Ð¾Ð±ÑƒÑ‡ÐµÐ½Ð¸Ñ Difference of Estimated Densities based Positive-Unlabeled Learning (DEDPUL) и алгоритм одноклаÑÑовой клаÑÑификации Deep Support Vector Data Description (Deep SVDD). Выполнено ÑкÑпериментальное иÑÑледование, обработка и анализ его результатов. По результатам иÑÑÐ»ÐµÐ´Ð¾Ð²Ð°Ð½Ð¸Ñ Ð±Ñ‹Ð»Ð¸ Ñделаны Ñледующие выводы. Показано, что в ÑлучаÑÑ…, когда размер извеÑтной положительной выборки мал или в ней ÑодержатÑÑ Ð½ÐµÐºÐ¾Ñ€Ñ€ÐµÐºÑ‚Ð½Ð¾ размеченные данные, Ñффективнее иÑпользовать алгоритм PU-обучениÑ. Ð’ ÑлучаÑÑ… Ñложного разыменованиÑ, Ñ ÑƒÑ‡ÐµÑ‚Ð¾Ð¼ входÑщих в отрицательные и положительные клаÑÑÑ‹ подклаÑÑов, алгоритмы показывают Ñравнимую ÑффективноÑть. При работе Ñ Ñ‚ÐµÐºÑтовыми и многомерными чиÑловыми данными алгоритм PU-Ð¾Ð±ÑƒÑ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ÐºÐ°Ð·Ð°Ð» лучшие результаты. При работе Ñ Ð½Ð°Ð±Ð¾Ñ€Ð°Ð¼Ð¸ данных изображений результаты метрик ÑффективноÑти Ð´Ð»Ñ Ð°Ð»Ð³Ð¾Ñ€Ð¸Ñ‚Ð¼Ð¾Ð² Ñхожи. Также Ð´Ð»Ñ Ð¸Ð·Ð¾Ð±Ñ€Ð°Ð¶ÐµÐ½Ð¸Ð¹ показана теоретичеÑÐºÐ°Ñ Ð²ÐµÑ€Ð¾ÑтноÑть ÑƒÐ»ÑƒÑ‡ÑˆÐµÐ½Ð¸Ñ Ð¾Ð±Ñ‰Ð¸Ñ… результатов задачи при иÑпользовании комбинации алгоритмов одноклаÑÑовой клаÑÑификации и PU-обучениÑ. Приведены перÑпективы дальнейшей работы.
The given work compares the effectiveness of the algorithms of one-class classification and PU-learning (eng. Positive Unlabeled learning) on different types of data and with different approaches to dereferencing of the original classes. A review of the current state of research in this area and an analysis of the compared methods were performed. Solutions were proposed for the current problems of comparing algorithms. Specific metrics for experimental comparison were analyzed and selected. For further research, the Difference of Estimated Densities based Positive-Unlabeled Learning (DEDPUL) PU-learning algorithm and the Deep Support Vector Data Description (Deep SVDD) one-class classification algorithm are implemented. Preformed an experimental study, results were processed and analyzed. According to the results of the study, the following conclusions were made. Shown that in cases where the size of the known positive sample is small or it contains incorrectly labeled data, it is more efficient to use the PU-learning algorithm. In cases of complex dereferencing, taking into account the subclasses that are part of the negative and positive classes, the algorithms show comparable efficiency. When working with textual and multidimensional numeric data, the PU-learning algorithm has shown advantages in efficiency. When working with image data sets, the results of the efficiency metrics for the algorithms are similar. Also, for images the theoretical probability of improving the overall results of the problem using a combination of classmate classification algorithms and PU learning is shown.
semi-supervised learning, PU-learning, machine learning, оne class classification, PU-обÑÑение, обÑÑение Ñ ÑаÑÑиÑнÑм пÑивлеÑением ÑÑиÑелÑ, маÑинное обÑÑение, одноклаÑÑÐ¾Ð²Ð°Ñ ÐºÐ»Ð°ÑÑиÑикаÑиÑ
semi-supervised learning, PU-learning, machine learning, оne class classification, PU-обÑÑение, обÑÑение Ñ ÑаÑÑиÑнÑм пÑивлеÑением ÑÑиÑелÑ, маÑинное обÑÑение, одноклаÑÑÐ¾Ð²Ð°Ñ ÐºÐ»Ð°ÑÑиÑикаÑиÑ
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
