publication . Conference object . 2017

Adversarial Reinforcement Learning in a Cyber Security Simulation}

Elderman, Richard; Pater, Leon J.J.; Thie, Albert S.; Drugan, Madalina M.; Wiering, Marco A.; Filipe, Joaquim; van den Herik, Jaap; Rocha, Ana Paula; Filipe, Joaquim;
Open Access English
  • Published: 01 Jan 2017
  • Country: Netherlands
This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo learning with the Softmax exploration strategy is most effective in performing the defender role and also for learning attacking strategies.
free text keywords: reinforcement learning (RL), Security, Simulations, Adversarial setting, Cyber security in networks, Markov games, Reinforcement learning, Software, Control and Systems Engineering, Artificial Intelligence, Monte Carlo method, Adversarial system, Sequential decision, Complete information, Machine learning, computer.software_genre, computer, Artificial neural network, Markov game, business.industry, business, Computer science, Softmax function
Related Organizations
Download fromView all 5 versions
University of Groningen Digital Archive
Conference object . 2017
Provider: NARCIS
Conference object . 2017
Provider: NARCIS
Repository TU/e
Conference object . 2017
Provider: NARCIS

Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finitetime analysis of the multiarmed bandit problem. Machine Learning, 47:235-256.

Chung, K., Kamhoua, C., Kwiat, K., Kalbarczyk, Z., and Iyer, K. (2016). Game theory with learning for cyber security monitoring. IEEE HASE, pages 1-8.

Garivier, A. and Moulines, E. (2008). On upper-confidence bound policies for non-stationary bandit problems. ALT. [OpenAIRE]

Lin, L.-J. (1993). Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University.

Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In ICML, pages 157-163. [OpenAIRE]

Neumann, J. V. and Morgenstern, O. (2007). Theory of games and economic behavior. Princeton University Press.

Sharma, A., Kalbarczyk, Z., Barlow, J., and Iyer, R. (2011). Analysis of security data from a large computing organization. In 2011 IEEE/IFIP DSN, pages 506-517.

Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. The MIT press, Cambridge MA.

Szepesva´ri, C. (1997). The asymptotic convergence-rate of q-learning. In NIPS, pages 1064-1070.

Tambe, M. (2011). Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press, New York, NY, USA, 1st edition.

Uther, W. and Veloso, M. (2003). Adversarial reinforcement learning. Technical Report CMU-CS-03-107.

Wang, Y., Li, T., and Lin, C. (2013). Backward q-learning: The combination of sarsa algorithm and q-learning. Eng. Appl. of AI, 26:2184-2193.

Watkins, C. and Dayan, P. (1992). Q-learning. Machine Learning, 8:279-292. [OpenAIRE]

Wiering, M. and van Otterlo, M. (2012). Reinforcement Learning: State of the Art. Springer Verlag. [OpenAIRE]

Any information missing or wrong?Report an Issue