
The rapid growth of algorithmic trading and financial artificial intelligence has motivated the search for adaptive, data-driven decision-making techniques that can outperform traditional trading strategies. This paper investigates the application of Q-learning, a value-based reinforcement learning algorithm, to stock trading and portfolio management. The trading process is modelled as a Markov Decision Process, where states represent market indicators and technical signals, actions correspond to buy, sell, or hold decisions, and rewards are defined in terms of risk-adjusted returns. Using historical stock data obtained from the Yahoo Finance API, Q-learning agent is implemented and backtested against benchmark strategies such as Buy-and-Hold and Random trading. Experimental results demonstrate that the Q-learning framework can achieve competitive performance, with higher cumulative returns and improved Sharpe ratios, while also adapting to dynamic market conditions. The study contributes to the literature by providing a systematic implementation of Q-learning in financial markets, highlighting both its strengths and limitations. Furthermore, challenges such as data non-stationarity, sample efficiency, and risk management are discussed, while outlining potential extensions to advanced methods like Deep Q-Networks and Actor-Critic models. The findings underscore the potential of reinforcement learning as a promising paradigm for intelligent financial decision-making and provide valuable insights for traders, researchers, and policymakers.
Algorithmic Trading, Artificial Intelligence, Stock TradingStock Trading, Q-Learning, Reinforcement Learning, Financial Markets
Algorithmic Trading, Artificial Intelligence, Stock TradingStock Trading, Q-Learning, Reinforcement Learning, Financial Markets
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
