Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
Data sources: ZENODO
ZENODO
Dataset . 2024
Data sources: Datacite
ZENODO
Dataset . 2024
Data sources: Datacite
versions View all 2 versions
addClaim

Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar

Authors: Roberto, Ulloa;

Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar

Abstract

Main dataset (main.csv) The main file contains an entry (N=28530) per search result in all collected pages. It comprises the following columns: id: Unique identifier of the file (corresponds to the last part of the filename) filename: Name of the file associated with the row (the file is in serp_html.zip) engine: The search engine used (Google Scholar or Semantic Scholar). browser: The web browser used for the search (Firefox or Chrome) region: The geographical region where the search was made. year: The year when the search was made month: The month when the search was made day: The day when the search was made query: The full search query that was used query_type: The type of the search query (health or technology) topic: The topic associated with the search query ('covid vaccines', 'cryptocurrencies', 'internet', 'social media', 'vaccines', 'coffee') trt: Treatment variable associated with the search (benefits or risks). url: The URL of the (article) search result title: The title of the (article) search result. authorship: The author(s) of the (article) search result. abstract_id: Unique identifier for the abstract of the (article) search result which connects with annotated-abstracts_v0.6.xlsx abstract_hash: Hash value of the abstract for data integrity link_n: The total number of results in the search page rank: The rank of the search result on the search engine results page. annotation: Any annotations associated with the (article's abstract) search result. One of: '3. Confirms both benefits and risks', '4. Confirms neither benefits nor risks', '1. Confirms benefits', '2. Confirms risks', '5. Abstract not related to {topic}') valence: -1 for abstracts containing risks, 0 for neutral abstracts, 1 for abstracts only containing benefits Annotated abstracts (annotated-abstracts_v0.6.xlsx) Manually annotated abstracts resulting from the searches. Raw search engine result pages (serp_html.zip) The zip contains an HTML per search engine result page collected (N=2853). See column filename from the main dataset.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average