Temporal Validity Change Prediction - Dataset

This dataset contains data for temporal validity change prediction, an NLP task that will be defined in an upcoming publication. The dataset consists of five columns. target - A Tweet ID. This column must be manually rehydrated via the Twitter API to obtain the tweet text. follow_up - A synthetic follow-up tweet that semantically relates to the target tweet. context_only_tv - The expected temporal validity duration of the target tweet, when read in isolation. combined_tv - The expected temporal validity duration of the target tweet, when read together with the follow-up tweet. change - The TVCP task label, i.e., whether the temporal validity duration of the target tweet is decreased, unchanged (neutral), or increased by the information in the follow-up tweet. The duration labels (context_only_tv, combined_tv) are class indices of the following class distribution: [no time-sensitive information, less than one minute, 1-5 minutes, 5-15 minutes, 15-45 minutes, 45 minutes - 2 hours, 2-6 hours, more than 6 hours, 1-3 days, 3-7 days, 1-4 weeks, more than one month] Different dataset splits are provided. "dataset.csv" contains the full dataset. "train.csv", "val.csv", "test.csv" contain an 80-10-10 train-val-test split. "train[0-4].csv" and "test[0-4].csv" respectively contain training and test data for one of 5 folds for 5-fold cross-validation. The train file contains 80% of the data, while the test file contains 20%. To replicate the original experiments, the train file should be sorted by the preprocessed target tweet text, then the first 12.5% of target tweets should be sampled to generate validation data, leading to a 70-10-20 train-val-test split.

Related Organizations

University of Innsbruck
Austria

Keywords

temporal reasoning, temporal commonsense, temporal validity change prediction, temporal validity, tcs

EOSC Subjects

Twitter Data

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average