
Modern military decision aids must remain reliable under adversarial conditions that typically exceed their developer’s testing regimen. This thesis presents a reproducible experimentation framework built atop the Atlatl hex-grid wargame, which probes artificial intelligence (AI) vulnerabilities through probabilistic scenario generation, global sensitivity analysis, and local adversarial search. To test the framework, three reference agents are evaluated on a small scenario: NAMaiV5 and NAMaiV9 (scripted AI) and Pascal (a neural network trained on the test scenario). Latin Hypercube Sampling generates 20,000 diverse scenarios, each evaluated using a score differential between Blue-vs-Red and Red-vs-Red matches, from which Sobol indices isolate influential parameters. A neighborhood search heuristic procedure then degrades model performance by up to 65%, outperforming differential evolution in efficiency while achieving better score differential reduction. Behavioral heatmaps reveal consistent spatial biases, particularly when perturbing terrain near the map center. Results show that the scripted AIs fail most under force imbalance and opponent variation, while the neural network is more sensitive to scenario length and unseen terrain clusters. This testbed provides a scalable and interpretable process and tool for adversarial validation of military AI systems, offering actionable insight into operational robustness. Distribution Statement A. Approved for public release: Distribution is unlimited. Outstanding Thesis Lieutenant, United States Navy
DFO, reinforcement learning, JSON, RL, JavaScript object notation, LHS, adversarial machine learning, ONR, DE, AML, stochastic gradient descent, Monte Carlo tree search, CSV, Office of Naval Research, derivative-free optimization, SGD, differential evolution, empirical risk minimization, MCTS, comma-separated values, artificial intelligence, central processing unit, ML, DRM, machine learning, ERM, AI, diametrical risk minimization, Latin Hypercube Sampling, CPU
DFO, reinforcement learning, JSON, RL, JavaScript object notation, LHS, adversarial machine learning, ONR, DE, AML, stochastic gradient descent, Monte Carlo tree search, CSV, Office of Naval Research, derivative-free optimization, SGD, differential evolution, empirical risk minimization, MCTS, comma-separated values, artificial intelligence, central processing unit, ML, DRM, machine learning, ERM, AI, diametrical risk minimization, Latin Hypercube Sampling, CPU
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
