Downloads provided by UsageCounts
The MarkupMnA dataset is a corpus of 151 merger and acquisition agreements with annotated sections titles, section numbers, page numbers, and more, based on HTML filings by US public companies retrieved from the SEC EDGAR database. We consider the task of section title annotation as a sequence labeling task, and to that end, use the BEIOS tagging scheme when generating our annotations. There are over 70,000 labels in the entire dataset excluding outside labels and over 465,000 labels including outside labels. We add annotations to the contracts in an already widely used dataset, MAUD, which is an expert-annotated reading comprehension dataset. The broad objective of our work is to make progress toward developing computationally efficient hierarchical representations of long documents, specifically for legal contracts. We hope that our annotations can be used in conjunction with MAUD to advance legal NLP research. Please see [Rao et al 2023] and the corresponding GitHub repository for more details regarding this dataset.
SEC filings, Segmentation, markup, MarkupMnA, document understanding, Markup-Based Segmentation, Markup Based Segmentation, multimodal, MAUD, document segmentation, legal contracts
SEC filings, Segmentation, markup, MarkupMnA, document understanding, Markup-Based Segmentation, Markup Based Segmentation, multimodal, MAUD, document segmentation, legal contracts
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 1K | |
| downloads | 161 |

Views provided by UsageCounts
Downloads provided by UsageCounts