The AInception Dataset

The AInception dataset contains system, network, and cyber-physical logs generated from three simulated military cyber-defence storylines: SL100, SL300, and SL700. These were produced within the European Defence Fund project AInception (GA 101103385) and model realistic operational environments involving benign behaviour, adversarial activity, and multi-step attack chains. Simulation of storylines In AInception, a military scenario with six connected storylines has been developed. These are described in this report. Three of these storylines are used in this dataset: SL100 — UAV border surveillance: In SL100, national borders are monitored with unmanned aerial vehicles. The red actor initially compromises the ICT systems of the facility that controls the drones for intelligence purposes. This access is later used to crash a drone as a political signal. The included SL100 dataset/simulation contains benign UAV patrol missions followed by compromise of a Windows operator machine and an attack leading to UAV mission interruption. Included are system logs, Suricata alerts/netflows, UAV flight logs, and a STIX-based infrastructure graph. SL300 — Non-combatant evacuation operation: In SL300, the evacuation of diplomatic and government personnel is conducted through a land-based non-combatant evacuation operation. A battlegroup is deployed and sends armoured vehicles to escort buses with personnel to an airport for air evacuation. Compromised vehicle systems are leveraged to gain intelligence and disrupt the operation. The included SL300 dataset/simulation contains a Windows-based infrastructure, Active Directory, C2 systems, email servers, and simulated vehicles communicating with the HQ over satellite. The dataset includes eight multi-day simulations featuring benign operational behaviour and multiple attack variants affecting initial access, lateral movement, and service disruption. Logs include Windows Event/Sysmon, Linux audit logs, Suricata NetFlow, simulated user actions, and attack tool telemetry. SL700 — Battlegroup at home base: In SL700, the garrison where the battlegroup of SL300 has its home base, is compromised before deployment. A cyber attack against physical access and surveillance systems is used to enable physical access for the red actor. This access is leveraged in SL300. The included SL700 dataset/simulation contains surveillance systems, firewalls, routers, and specialised infrastructure. Includes simulations of reconnaissance, exploitation, privilege escalation, persistence techniques, and manipulation of firewall rules to disrupt CCTV video feeds. Each run provides raw host logs, network data, AttackMate timelines, and labelled subsets. Content Across the three storylines, the dataset includes: Host logs (Windows Event Logs, Sysmon, Linux audit logs, application logs) Network telemetry (Suricata alerts, NetFlow, PCAP fragments) UAV flight and mission logs (SL100) Simulated user activity traces (SL300) Structured attack timelines with MITRE ATT&CK mapping Infrastructure descriptions in STIX 2.1 graph format (SL100, SL700) Indicators of Compromise (IOCs) and STIX objects (SL300) Labelled/annotated malicious vs benign events (where available) Alerts Alert graphs (SL300; variant 2 and variant 5) Knowledge graph (SL300; variant 5) Attack-defence graphs in MAL (SL300) Scale The dataset spans 15 complete simulations, each a variant of one of the above storylines. Individual runs range from hours (SL700) to six days (SL300). Total logs include tens to hundreds of millions of events, depending on the storyline. Purpose This dataset is intended for cybersecurity research, particularly for the military domain, including: anomaly detection intrusion detection cyber-physical system security graph-based threat analysis behavioural modelling and concept drift research alert analysis and triage situational awareness response generation It provides realistic, diverse, and high-fidelity datasets aligned with operational military scenarios. A PDF file is included with additional details on the dataset and the underlying simulation. Each variant/simulation (represented as a separate ZIP file) contains a README file with further information.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average