Risk-Sensitive Portfolio Management by Using C51 Algorithm

Thammasorn Harnpadungkij; Warasinee Chaisangmongkon; Phond Phunchongharn

Found an issue? Give us feedback

Chiang Mai Journal o...arrow_drop_down

Chiang Mai Journal of Science

Article . 2022 . Peer-reviewed

Data sources: Crossref

Risk-Sensitive Portfolio Management by Using C51 Algorithm

descriptionPublicationkeyboard_double_arrow_right Article 30 Sep 2022Publisher:Chiang Mai UniversityJournal:Chiang Mai Journal of Science, volume 49 (issn: 0125-2526, eissn: 2465-3845,

Copyright policy )

Authors: Thammasorn Harnpadungkij; Warasinee Chaisangmongkon; Phond Phunchongharn;

doi: 10.12982/cmjs.2022.094

Risk-Sensitive Portfolio Management by Using C51 Algorithm

- Summary
- Metrics

Abstract

Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profi t but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fi xed length of previous returns. This work proposes a new approach to deal with the profi t-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a signifi cantly higher Sharpe ratio and lower maximum drawdown without sacrifi cing profi t compared to the C51algorithm utilizing a purely profi t-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training confi guration. We fi nd that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no signifi cant impact on performance. Our study provides statistical evidence of the effi ciency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process.

Related Organizations

King Mongkut's University of Technology Thonburi
Thailand

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now