
This repository contains the pre-computed artifacts required to reproduce the experimental results of the Theseus model presented in the paper "Theseus: Navigating the Labyrinth of Evaluation Bias in Provenance-based Intrusion Detection". These artifacts allow researchers to bypass the computationally intensive steps of graph construction and model training, enabling the direct reproduction of the evaluation metrics of Theseus (Table 2 in the paper) using the exact checkpoints reported. Contents Graph Construction Cache: Pre-processed PyTorch Geometric (PyG) data objects for the DARPA TC E3 datasets (Theia, Cadets, Trace, Fivedirections). These files contain the fully parsed provenance graphs with temporal isolation applied, ready for loading. Model Checkpoints: The specific trained model weights (.pt files) for Theseus used to generate the final results reported in the paper. Word2Vec Embeddings: Domain-specific semantic embeddings trained on the training splits of each dataset, required to embed the node features. Usage These artifacts are designed to be used in conjunction with the Theseus source code. Download the archive. Extract the archive directly into the project root directory. This will create the cache/ and checkpoints/ folders with the necessary files. Run the evaluation script to verify the results reported in the paper: ./scripts/reproduce_results.sh Datasets Covered DARPA Transparent Computing E3 (Theia, Cadets, Fivedirections, Trace)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
