Process Behavior Corpus and Benchmarking Datasets

A corpus of process behaviors and benchmarking datasets for semantics-aware process mining tasks. Files: process_behavior_corpus.csv: the text corpus, which contains the behavior allowed by process models as sequences of activities (column string_traces). T_SAD.csv: A benchmark dataset generated from the corpus to assess the following task: Given a trace σ, decide if σ is a valid execution of the underlying process or not, without knowing the behavior allowed in the process.Each row contains a trace (column trace) with a corresponding label (column anomalous) indicating whether the trace represents a valid execution of the underlying process. The set of activities that can occur in the process are also given (column unique_activities). A_SAD.csv: A benchmark dataset generated from the corpus to assess the following task: Given an eventually-follows relation ef = a ≺ b ofa trace σ, decide if ef represents a valid execution order of the two activities a and b that are executed in a process or not, without knowing the behavior allowed in the process.Each row contains an eventually-follows relation (column eventually_follows) with a corresponding label (column out_of_order) indicating wether the two activities of the relation were executed in an invalid order (TRUE) or in a valid order (FALSE) according to the underlying process (model). The set of activities that can occur in the process are also given (column unique_activities). S_NAP.csv: A benchmark dataset generated from the corpus to assess the following task: Given an event log L and a prefix p_k of length k, with 1 < k, predict the next activity a_k+1Each row contains a trace prefix (column prefix) with a corresponding next activity (column next) indicating the activity that should be performed next after the last activity of the prefix according to the trace from which the prefix was generated. The set of activities that can occur in the process are also given (column unique_activities). S-PMD.csv: A benchmark dataset generated from the corpus to assess the following tasks: Given a set of possible activities (column unique_activities), generate a difectly follows graph (column dfg) that captures the trace semantics of the process model. Given a set of possible activities (column unique_activities), generate a simple process tree (column pt) that captures the trace semantics of the process model. Reference and legal info: The corpus and the benchmark datasets are generated using the SAP-SAM dataset: Kampik, T., Warmuth, C., Sola, D., Schäfer, B., Axworthy, L., Ivarsson, E., Ouda, K., & Eickhoff, D. (2022). SAP Signavio Academic Models (0.5.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7012043The SAP-SAM dataset is published with a specific license (see "Rights"), which, therefore, also applies to the data published in this record. THE DATASETS AND ASSOCIATED EVALUATION EXPERIMENTS ARE DESCRIBED IN THIS PAPER. IN THIS REPOSITORY YOU FIND THE CODE AND RAW RESULTS OF EVALUATION EXPERIMENTS USING VARIOUS OPEN SOUCE LLMs TO SOLVE THE TASKS

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average