
Main dataset (main.csv) The main file contains an entry (N=28530) per search result in all collected pages. It comprises the following columns: id: Unique identifier of the file (corresponds to the last part of the filename) filename: Name of the file associated with the row (the file is in serp_html.zip) engine: The search engine used (Google Scholar or Semantic Scholar). browser: The web browser used for the search (Firefox or Chrome) region: The geographical region where the search was made. year: The year when the search was made month: The month when the search was made day: The day when the search was made query: The full search query that was used query_type: The type of the search query (health or technology) topic: The topic associated with the search query ('covid vaccines', 'cryptocurrencies', 'internet', 'social media', 'vaccines', 'coffee') trt: Treatment variable associated with the search (benefits or risks). url: The URL of the (article) search result title: The title of the (article) search result. authorship: The author(s) of the (article) search result. abstract_id: Unique identifier for the abstract of the (article) search result which connects with annotated-abstracts_v0.6.xlsx abstract_hash: Hash value of the abstract for data integrity link_n: The total number of results in the search page rank: The rank of the search result on the search engine results page. annotation: Any annotations associated with the (article's abstract) search result. One of: '3. Confirms both benefits and risks', '4. Confirms neither benefits nor risks', '1. Confirms benefits', '2. Confirms risks', '5. Abstract not related to {topic}') valence: -1 for abstracts containing risks, 0 for neutral abstracts, 1 for abstracts only containing benefits Annotated abstracts (annotated-abstracts_v0.6.xlsx) Manually annotated abstracts resulting from the searches. Raw search engine result pages (serp_html.zip) The zip contains an HTML per search engine result page collected (N=2853). See column filename from the main dataset.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
