
Randomly Sampled Users Dataset (RSU.csv) This dataset consists of 11,173 users collected through Twitter's APIs. We collected 10,000 random English tweets in February 2023 using Twitter's Volume Stream API. The tweets were posted by around 3,000 users. For each user, we collected up to 100 of its most recent followees using Twitter's Following API. Through the Timeline and Liking APIs, for each user, we collected their most recent tweets (up to 3,200 tweets due to Twitter's limit) and liked-tweets (up to 3,200 too). We then filtered out users that have insufficient tweets (less than 100 original-tweets or less than 80 retweets/liked-tweets) to ensure that the sample sizes are statistically significant in our analyses. Finally, we have 11,173 users along with 40,405,150 tweets. Humanities Dataset (HUM.csv) This dataset contains 341,285 tweets and 498 Twitter accounts from selected Twitter lists including Book Author, Christianity, Artists, Buddhism, Musician, and Philosophers. We use Twitter's List and Timeline APIs to collect the accounts and their most recent tweets (up to 1,000). The dataset was collected in January 2024. Politics Dataset (POL.csv) This dataset contains all tweets from selected U.S. news media and U.S. politicians including Senators, House Members, US Governors, US Secretaries of State, US Cabinet, and US Election Officials at collection time. We used Twitter's Timeline API to collect the accounts' tweets (up to 3,200 tweets). The dataset was collected in May 2023, with 8,153,745 tweets and 3,784 Twitter accounts. Data Fields Due to Twitter's content redistribution policy, we are only allowed to publish tweet IDs and user IDs. Therefore, in each dataset, each row/datapoint represent a tweet, containing two fields --- tweet_id and user_id.
Twitter Data
Twitter Data
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
