Text data used in an article Tatsuya Haga, Yohei Oseki, Tomoki Fukai, "A unified neural representation model for spatial and semantic computations" (preprint in biorxiv doi: https://doi.org/10.1101/2023.05.11.540307). Codes and usage of data are available at https://github.com/TatsuyaHaga/DSI_codes Main dataset (enwiki_processed_pickle): This file contains preprocessed text data of 100,000 articles randomly sampled from English Wikipedia dump taken on 22-May-2020 (https://dumps.wikimedia.org/enwiki/latest/). Additional dataset (wikitext103train_processed_pickle): This file contains preprocessed text data based on WikiText-103 dataset (Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016. Pointer Sentinel Mixture Models. http://arxiv.org/abs/1609.07843) Both text data have already been preprocessed: all characters were lowercased, punctuation characters were removed, and all words were tokenized. Data format is python pickle format. We publish data under CC-BY-SA following the license of original datasets.

Related Organizations

National Institute of Information and Communications Technology
Japan

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average