Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

Object-Centric Event Log (OCEL) of the Enron Email Dataset

Authors: Berti, Alessandro;

Object-Centric Event Log (OCEL) of the Enron Email Dataset

Abstract

Description: This dataset provides an object-centric event log (OCEL) representation of the publicly available Enron email corpus. The OCEL format allows for a richer analysis of interconnected processes and objects, making it particularly suitable for advanced process mining techniques, communication pattern analysis, and social network exploration. The event logs were generated from a pre-processed CSV version of the Enron emails using a custom Python script leveraging the PM4Py library. The script parses individual emails to extract key information, including: Timestamps: Derived from the 'Date' field of emails, parsed into timezone-aware datetime objects. Activities: Inferred from email subject prefixes (e.g., "Re:" becomes "Response", "Fw:" becomes "Forwarding", "Invitation:" becomes "Invitation"). Emails without recognized prefixes are assigned a "Default" activity. Objects: Two primary object types are identified: EMAILADDRESS: Extracted from 'From', 'To', and 'Cc' fields. MESSAGEID: Extracted from 'Message-ID', 'In-Reply-To', and 'References' fields, prefixed with "MID_" in the OCEL to ensure unique object identifiers across types. Attributes: Event attributes include the original cleaned subject and content of the email. Relationships: Events (emails) are linked to EMAILADDRESS objects with qualifiers 'FROM', 'TO', or 'CC'. Events are linked to MESSAGEID objects with qualifiers 'MESSAGEID' (for the email's own ID), 'INREPLYTO', or 'REFERENCES' to trace conversational threads. To accommodate various analytical needs and computational resources, the dataset is provided in three distinct checkpoints: Top 10,000 Emails: An OCEL generated from the first 10,000 emails processed. Top 100,000 Emails: An OCEL generated from the first 100,000 emails processed. All Emails: An OCEL generated from all emails processed by the script from the input emails.csv file. Each checkpoint is available in the .jsonocel format (OCEL 2.0 standard), ready for use with PM4Py and other OCEL-compatible process mining tools. This dataset can be valuable for researchers and practitioners seeking to apply object-centric process discovery, conformance checking, and enhancement techniques to a large, real-world communication log. Keywords: Object-Centric Event Log, OCEL, Process Mining, Enron Dataset, Email Analysis, Communication Networks, Social Network Analysis, PM4Py

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average