Algorithms for partially observable Markov decision processes

Name: Algorithms for partially observable Markov decision processes
Creator: Zhang, Weihong
Keywords: Markov processes, 006, Statistical decision, Dynamic programming, Computer algorithms

Zhang, Weihong

Found an issue? Give us feedback

https://doi.org/10.1...arrow_drop_down

https://doi.org/10.14711/thesi...

Doctoral thesis . 2014 . Peer-reviewed

Data sources: Crossref

The Hong Kong University of Science and Technology: HKUST Institutional Repository

Thesis . 2001

Data sources: Bielefeld Academic Search Engine (BASE)

Algorithms for partially observable Markov decision processes

descriptionPublicationkeyboard_double_arrow_right Doctoral thesis , Thesis 23 Dec 2014 China (People's Republic of) Publisher:The Hong Kong University of Science and Technology Library

Authors: Zhang, Weihong;

doi: 10.14711/thesis-b710758

Algorithms for partially observable Markov decision processes

- Summary
- Subjects
- Metrics

Abstract

Partially Observable Markov Decision Process (POMDP) is a general sequential decision-making model where the effects of actions are nondeterministic and only partial information about world states is available. However, finding near optimal solutions for POMDPs is computationally difficult. Value iteration is a standard algorithm for solving POMDPs. It conducts a sequence of dynamic programming (DP) updates to improve value functions. Value iteration is inefficient for two reasons. First, a DP update is expensive due to the need of accounting for all belief states in a continuous belief space. Second, value iteration needs to conduct a large number of DP updates before its convergence. This thesis investigates two ways to accelerate value iteration. The work presented centers around the idea of conducting DP updates and therefore value iteration over a belief subspace, a subset of belief space. The first use of belief subspace is to reduce the number of DP updates for value iteration to converge. We design a computationally cheap procedure considering a belief subspace which consists of a finite number of belief states. It is used as an additional step for improving value functions. Due to additional improvements by the procedure, value iteration conducts fewer DP updates and therefore is more efficient. The second use of belief subspace is to reduce the complexity of DP updates. We establish a framework on how to carry out value iteration over a belief sub-space determined by a POMDP model. Whether the belief subspace is smaller than the belief space is model-dependent. If this is true for a POMDP, value iteration over the belief subspace is expected to be more efficient. Based on this framework, we study three POMDP classes with special problem characteristics and propose different value iteration algorithms for them. (1) An informative POMDP assumes that an agent always has a good idea about the world states. The subspace determined by the model is much smaller than the belief space. Value iteration over the ...

Country

China (People's Republic of)

Related Organizations

Hong Kong University of Science and Technology (香港科技大學)
China (People's Republic of)
Hong Kong Polytechnic University
China (People's Republic of)

Keywords

Markov processes, 006, Statistical decision, Dynamic programming, Computer algorithms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

4

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now