Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Общие улучшения в градиентном бустинге

выпускная квалификационная работа бакалавра

Общие улучшения в градиентном бустинге

Abstract

Данная работа посвящена повышению качества алгоритма машинного обучения, известного как градиентный бустинг. Задачи, которые решались в ходе исследования: 1. Разбор работы алгоритма градиентного бустинга на деревьях решений. 2. Реализация градиентного бустинга на деревьях решений с улучшениями для решения задачи регрессии. 3. Сравнение качества моделей градиентного бустинга своей реализациии с известными доступными реализациями. 4. Изучение влияния реализованных идей по улучшению алгоритма на его качество. Список основных исследуемых улучшений алгоритма: • Частично-случайные пороги признаков. • Гистограммы признаков с переменной сеткой. • Случайная добавка к цене разбиения при построении деревьев решений. В результате был написан программный модуль на языке C++ для языка Python 3, который содержит реализацию градиентного бустинга с улучшениями. Было проведено сравнение качества моделей машинного обучения, полученных в ходе своей реализации, с известными доступными реализациям на предмет качества моделей. Было исследовано влияние улучшений, представленных в работе, на качество моделей.

The given work is devoted to the quality improvement of the machine learning algorithm known as gradient boosting. The research set the following goals: 1. Understanding of the work of algorithm of gradient boosting based on regression trees in application to the solution of the regression problem. 2. Gradient boosting based on regression trees algorithm implementation. 3. Comparison of quality of gradient boosting models of the proposed implementation with known available implementations. 4. Research of impact of the implemented ideas of algorithm improvements on its quality. The list of main researched algorithm improvements: • Partially randomized feature thresholds. • Feature histograms with variable grid. • Random additive to the score of splits during decision tree fit. As the result, the program module with gradient boosting algorithm with im-provements implementation has been written in C++ language to use in Python 3 programming language. Comparison between the quality of machine learning models got with the proposed implementation and known available implementations has been done. The impact of the proposed improvements on the quality of the models has been studied.

Keywords

частично-случайные деревья решений, machine learning, задача регрессии, decision trees, гистограммы признаков, partially randomized decision trees, градиентный бустинг, машинное обучение, деревья решений, features histograms, gradient boosting, regression problem

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author? Do you have the OA version of this publication?