Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Thesis . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2021
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Thesis . 2021
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Perturbation-Based Confidence Regions and Their Application to Stochastic Bandits

Authors: Kiss, Botond;

Perturbation-Based Confidence Regions and Their Application to Stochastic Bandits

Abstract

Constructing confidence intervals via data perturbation can be used as a system identification approach to estimate the parameter of linear regression models. Such a method is called Sign-perturbed Sums (SPS), which is a novel, nonparametric solution that relies only on mild statistical assumptions and provides non-asymptotic confidence set that contains the unknown system parameter with predetermined probability. The essence of this thesis is the exhaustive analysis and machine learning application of the SPS method. In the first chapter of the thesis, we give a brief insight into stochastic multi-armed bandit (MAB) problems, which are sequential decision making tasks in random environments. MAB problems illustrate the core dilemma of reinforcement learning (which is one of the three branches of machine learning), the exploration vs. exploitation dilemma. We give an overview of the key concepts of bandits and a state of the art approach, called Upper Confidence Bound policy, which is proven to be an efficient (but typically parametric) solution to tackle such stochastic sequential allocation problems. In the second part, we get acquainted with Sign-perturbed Sums (SPS) method, and we present several known properties of SPS confidence regions regarding their shape and size. Furthermore, as our contribution to the existing theory of SPS, we examine the cases of degenerate confidence sets and estimate the probability of their occurrence. We also present a generalization of the SPS method, which allows not only sign but also symmetrical data perturbation, and prove the exact confidence level of the regions provided by symmetrically-perturbed sums method. In addition, we consider further modifications that do not affect the inclusion of the system parameter with the user-chosen confidence probability. In general, the determination of SPS confidence regions in a compact representation is a computationally intensive task. However, in one dimension, the exact confidence region can be calculated efficiently in closed form. In the final, third chapter of the thesis, we examine the one-dimensional SPS method to gain deeper understanding of the construction. Using the alterations proved in the second chapter, we introduce the linear SPS method, which provides semi-infinite confidence intervals. Lastly, we propose a nonparametric, perturbation-based stochastic multi armed bandit algorithm (which relies on linear SPS). To empirically compare it with existing UCB policy, we implemented in Python all the algorithms to be examined in the thesis and ran various simulations.

Keywords

stochastic bandits, confidence regions, randomized algorithms, sign-perturbed sums

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 3
    download downloads 4
  • 3
    views
    4
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
3
4
Green