Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC 0
Data sources: ZENODO
DRYAD
Dataset . 2022
License: CC 0
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Irreproducibility in searches of scientific literature: a comparative analysis

Authors: Pozsgai, Gabor; Lövei, Gabor; Vasseur, Liette; Gurr, Geoff; Batáry, Péter; Korponai, Janos; Littlewood, Nick; +9 Authors

Irreproducibility in searches of scientific literature: a comparative analysis

Abstract

Three major scientific search platforms, PubMed, Scopus, and Web of Science, and Google Scholar, were used in this study. We generated keyword expressions (search strings) with two complexity levels using keywords that focused on an ecological topic and ran standardized searches from various institutions in the world (see below), all within a limited timeframe. Simple search strings contained only one main keyphrase, without using logical (Boolean) operators, whereas complex ones contained both inclusion and exclusion criteria for additional, related, keywords and key phrases (i.e. two-word expressions within quotation marks). In complex search strings Boolean operators were also used. The simple keyword was “ecosystem services” while the complex one was “ecosystem service” AND “promoting” AND “crop” NOT “livestock”. Search language was set to English in every case, and only titles, abstracts and keywords were searched. Since there is no option in Google Scholar to limit the search to titles, keywords, and abstracts, we used the default search in this case. Since different search platforms use slightly different expressions for the same query, exact search term formats were generated for each search. Searches were conducted on one or two machines at each of the 12 institutions in Australia, Canada, China, Denmark, Germany, Hungary, UK, and the USA (Supplementary material 2), using three commonly used browsers (Mozilla Firefox, Internet Explorer, and Google Chrome). Searches were run manually (i.e. no APIs were used) according to strict protocols, which allowed standardization of search date, exact search term for every run, and the data recording procedure. Not all platforms were queried from every location: Google products are not available in China, and Scopus was not available at some institutions (Supplementary material 2). The original version of the protocol is provided in Supplementary material 3. The first run was conducted at 11:00 Australian Eastern Standard Time (01:00 GMT) on 13 April 2018 and the last search run at 18:16, Eastern Daylight Time (22:16 GMT, 13 April 2018). After each search run, the number of hits was recorded and the bibliographic data of the first 20 articles were extracted and saved in a file format that the website offered (.csv, .txt). Once search combinations were completed, the browsers’ cache was emptied, to make sure the testers’ previous searches did not influence the results, and the process was repeated. At four locations (Flakkebjerg, Denmark; Fuzhou, China; St. Catharines, Canada; Orange, Australia) the searches were also repeated on two different computers. This resulted in 228, 132, 228, and 144 search runs for Web of Science, Scopus, PubMed, and Google Scholar, respectively. Results were collected from each contributor, bibliographic information was automatically extracted from the identically structured saved files using a loop in the R statistical software (R Core Team, 2012), and stored in a standardized MySQL database, allowing unique publications to be distinguished. If unique identifiers for individual articles were missing, authors, titles, or the combination of these were searched for, and uniqueness was double-checked across the entire dataset. Saved data files with non-standard structures were dealt with manually. All data cleaning and manipulations were done by R.

1. Repeatability is the cornerstone of science and it is particularly important for systematic reviews. However, little is known on how researchers’ choice of database and search platform influence the repeatability of systematic reviews. Here, we aim to unveil how the computer environment and the location where the search was initiated from influence hit results. 2. We present a comparative analysis of time-synchronized searches at different institutional locations in the world, and evaluate the consistency of hits obtained within each of the search terms using different search platforms. 3. We revealed a large variation among search platforms and showed that PubMed and Scopus returned consistent results to identical search strings from different locations. Google Scholar and Web of Science’s Core Collection varied substantially both in the number of returned hits and in the list of individual articles depending on the search location and computing environment. Inconsistency in Web of Science results has most likely emerged from the different licensing packages at different institutions. 4. To maintain scientific integrity and consistency, especially in systematic reviews, action is needed from both the scientific community and scientific search platforms to increase search consistency. Researchers are encouraged to report the search location and the databases used for systematic reviews, and database providers should make search algorithms transparent and revise access rules to titles behind paywalls. Additional options for increasing the repeatability and transparency of systematic reviews are storing both search metadata and hit results in open repositories and using Application Programming Interfaces (APIs) to retrieve standardized, machine-readable search metadata.

Keywords

Database, Search Engine, FOS: Agricultural sciences, Information retrieval, information retrieval, repeatability, evidence synthesis methods, reproducibility, Reproducibility

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 7
    download downloads 6
  • 7
    views
    6
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
1
Average
Average
Average
7
6
Related to Research communities