
Description: This dataset provides an object-centric event log (OCEL) representation of the publicly available Enron email corpus. The OCEL format allows for a richer analysis of interconnected processes and objects, making it particularly suitable for advanced process mining techniques, communication pattern analysis, and social network exploration. The event logs were generated from a pre-processed CSV version of the Enron emails using a custom Python script leveraging the PM4Py library. The script parses individual emails to extract key information, including: Timestamps: Derived from the 'Date' field of emails, parsed into timezone-aware datetime objects. Activities: Inferred from email subject prefixes (e.g., "Re:" becomes "Response", "Fw:" becomes "Forwarding", "Invitation:" becomes "Invitation"). Emails without recognized prefixes are assigned a "Default" activity. Objects: Two primary object types are identified: EMAILADDRESS: Extracted from 'From', 'To', and 'Cc' fields. MESSAGEID: Extracted from 'Message-ID', 'In-Reply-To', and 'References' fields, prefixed with "MID_" in the OCEL to ensure unique object identifiers across types. Attributes: Event attributes include the original cleaned subject and content of the email. Relationships: Events (emails) are linked to EMAILADDRESS objects with qualifiers 'FROM', 'TO', or 'CC'. Events are linked to MESSAGEID objects with qualifiers 'MESSAGEID' (for the email's own ID), 'INREPLYTO', or 'REFERENCES' to trace conversational threads. To accommodate various analytical needs and computational resources, the dataset is provided in three distinct checkpoints: Top 10,000 Emails: An OCEL generated from the first 10,000 emails processed. Top 100,000 Emails: An OCEL generated from the first 100,000 emails processed. All Emails: An OCEL generated from all emails processed by the script from the input emails.csv file. Each checkpoint is available in the .jsonocel format (OCEL 2.0 standard), ready for use with PM4Py and other OCEL-compatible process mining tools. This dataset can be valuable for researchers and practitioners seeking to apply object-centric process discovery, conformance checking, and enhancement techniques to a large, real-world communication log. Keywords: Object-Centric Event Log, OCEL, Process Mining, Enron Dataset, Email Analysis, Communication Networks, Social Network Analysis, PM4Py
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
