Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY SA
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY SA
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY SA
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY SA
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Wilcoxon Rank Sum Test and Keyphrase Extraction Data Cited in "What Everyone Says: Public Perceptions of the Humanities in the Media"

Authors: WhatEvery1Says (WE1S) Project;

Wilcoxon Rank Sum Test and Keyphrase Extraction Data Cited in "What Everyone Says: Public Perceptions of the Humanities in the Media"

Abstract

This repository contains Wilcoxon rank sum test and keyphrase extraction data cited in the WhatEvery1Says (WE1S) Project's article "What Everyone Says: Public Perceptions of the Humanities in the Media". The organization of the materials is discussed below. Wilcoxon Rank Sum Test All data and results from Wilcoxon rank sum testing can be found in the wilcoxon-tests folder of the extracted zip file we1s_about_the_humanities.zip. The Wilcoxon rank sum test identifies specific words that appear significantly more in one group of documents as compared to another, thus providing researchers with an understanding of what words are “distinctive” to each group. Further information on WE1S's use of Wilcoxon rank sum testing can be found at https://we1s.ucsb.edu/wp-content/uploads/M-15-Wilcoxon-Test.pdf. Each subdirectory in the wilcoxon-test folder contains the data and results of a particular comparison experiment based on a metadata category such as whether the data contained articles published by public or private institutions. Each data file is a .txt file representing a sample of the overall data from the collection. The README file provides information on the collection used, the sample size, and the nature of the comparison. The results for the test are in a file called results.csv. The results.csv file for each test includes a row for each term included in the test. Each row displays the term, the term's raw count in each category compared (count 1 and count 2), the difference between the 2 counts (count 1 minus count 2), the percentage change in the counts, the Wilcoxon statistic, and the Wilcoxon p-value. Sorting the csv by the Wilcoxon stat from greatest to least will cause the terms most strongly associated with category 1 to come to the top (category 1 is the category listed first in the title field of the README.md file for each test), while sorting it by the Wilcoxon stat from least to greatest will cause the terms most strongly associated with category 2 to come to the top (category 2 is the category listed second). The p-value column provides you with information about how confident you can be about each comparison's significance. Keyphrase Extraction All data and results from Wilcoxon rank sum testing can be found in the keyphrase-extraction folder of the extracted zip file we1s_about_the_humanities.zip. Keyphrase extraction generates a list of the most significant words or phrases (1-6 words long) within individual documents. WE1S takes the top ten keyphrases in each document and ranks them according to their frequency across the collection. WE1S uses the SGRank algorithm for keyphrase extraction, and because this algorithm is computationally intensive, WE1S limits keyphrases to lemmatized nouns and proper nouns within a window of 70 words to either side of candidate keyphrases. Further information on WE1S's use of keyphrase extaction can be found at https://we1s.ucsb.edu/wp-content/uploads/M-14-Keyphrase-Extraction.pdf. Each subdirectory in the keyphrase-extraction folder contains the data and results of keyphrase extraction on a particular collection. Details of the collection and resulting files can be found in each subdirectory. Each list of keyphrases is in a file called SGRank.csv, which lists the keyphrases and their number of occurrences in the collection. The article additionally cites keyphrases that are shared with the terms in the public topic model produced by Andrew Goldstone and Ted Underwood, “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us,” New Literary History 45, no. 3 (2014): 359–84, https://doi.org/10.1353/nlh.2014.0025. The list of terms is derived from the public visualization at https://www.sas.rutgers.edu/virtual/ag978/quiet/#/words. Keyphrases extracted from WE1S data were split into single-word terms and compared with the list of vocabulary in Goldstone and Underwood's word list (quiet_transformations_wordlist.txt) to compile lists of shared vocabulary. These lists are given in files called shared_terms.txt. Note that keyphrases were extracted for corpora produced using the Python Textacy library. Because these corpora contain the full text of articles with intellectual property restrictions they cannot be reproduced here.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 20
    download downloads 3
  • 20
    views
    3
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
20
3