
This release provides InciText v1.0, an incident-centric text dataset released for research and academic use. Contents The dataset includes three components packaged together: Raw and processed text corpora (DataSet_Attribute_Extraction/) Structured annotation files (Annotation/) Derived, normalized datasets used in experiments (Derived_Datasets/) The Derived_Datasets/ folder contains the paper-faithful PostgreSQL exports and is the recommended entry point for reproducing reported results. Scope InciText includes incident reports, press releases, newspaper articles, and synthetic or generated narratives used for attribute extraction and retrieval research. Some documents were provided in privacy-reviewed or historical form.Users should follow applicable ethical and institutional guidelines. Dataset Composition The frequency of each data type in the FemmIR-text corpus is: Newspaper articles: 300 Officer narratives: 40 Press releases: 13 Dispatch reports: 5 Synthetic narratives: 1500 Citation If you use this dataset, please cite: @misc{solaiman2025modularunsupervisedframeworkattribute, title={A Modular Unsupervised Framework for Attribute Recognition from Unstructured Text}, author={KMA Solaiman}, year={2025}, eprint={2507.03949}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2507.03949},}
incident reports, annotations, dataset, information retrieval, attribute extraction
incident reports, annotations, dataset, information retrieval, attribute extraction
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
