Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: ZENODO
versions View all 2 versions
addClaim

Web Data Commons (October 2021) Property and Datatype Usage Dataset

Authors: Jan Martin Keil;

Web Data Commons (October 2021) Property and Datatype Usage Dataset

Abstract

This is a dataset about the usage of properties and datatypes in the Web Data Commons RDFa, Microdata, Embedded JSON-LD, and Microformats Data Sets (October 2021) based on the Common Crawl October 2021 archive. The dataset has been produced using the RDF Property and Datatype Usage Scanner v2.1.1, which is based on the Apache Jena framework. Only RDFa and embedded JSON-LD data were considered, as Microdata and Microformats do not incorporate explicit datatypes. Dataset Properties Size: 0.2 GiB compressed, 4.4 GiB uncompressed, 20 361 829 rows plus 1 head line determined using gunzip -c measurements.csv.gz | wc -l Parsing Failures: The scanner failed to parse 45 833 332 triples (~0.1 %) of the source dataset (containing 38 812 275 607 triples). Content: CATEGORY: The category (html-embedded-jsonld or html-rdfa) of the Web Data Commons file that has been measured. FILE_URL: The URL of the Web Data Commons file that has been measured. MEASUREMENT: The applied measurement with specific conditions, one of: UnpreciseRepresentableInDouble: The number of lexicals that are in the lexical space but not in the value space of xsd:double. UnpreciseRepresentableInFloat: The number of lexicals that are in the lexical space but not in the value space of xsd:float. UsedAsDatatype: The total number of literals with the datatype. UsedAsPropertyRange: The number of statements that specify the datatype as range of the property. ValidDateNotation: The number of lexicals that are in the lexical space of xsd:date. ValidDateTimeNotation: The number of lexicals that are in the lexical space of xsd:dateTime. ValidDecimalNotation: The number of lexicals that represent a number with decimal notation and whose lexical representation is thereby in the lexical space of xsd:decimal, xsd:float, and xsd:double. ValidExponentialNotation: The number of lexicals that represent a number with exponential notation and whose lexical representation is thereby in the lexical space of xsd:float, and xsd:double. ValidInfOrNaNNotation: The number of lexicals that equals either INF, +INF, -INF or NaN and whose lexical representation is thereby in the lexical space of xsd:float, and xsd:double. ValidIntegerNotation: The number of lexicals that represent an integer number and whose lexical representation is thereby in the lexical space of xsd:integer, xsd:decimal, xsd:float, and xsd:double. ValidTimeNotation: The number of lexicals that are in the lexical space of xsd:time. ValidTrueOrFalseNotation: The number of lexicals that equal either true or false and whose lexical representation is thereby in the lexical space of xsd:boolean. ValidZeroOrOneNotation: The number of lexicals that equal either 0 or 1 and whose lexical representation is thereby in the lexical space of xsd:boolean, and xsd:integer, xsd:decimal, xsd:float, and xsd:double. Note: Lexical representation of xsd:double values in embedded JSON-LD got normalized to always use exponential notation with up to 16 fractional digits (see related code). Be careful by drawing conclusions from according Valid… and Unprecise… measures. PROPERTY: The property that has been measured. DATATYPE: The datatype that has been measured. QUANTITY: The count of statements that fulfill the condition specified by the measurement per file, property and datatype. Preview "CATEGORY","FILE_URL","MEASUREMENT","PROPERTY","DATATYPE","QUANTITY" "html-rdfa","http://data.dws.informatik.uni-mannheim.de/structureddata/2021-12/quads/dpef.html-rdfa.nq-00000.gz","UnpreciseRepresentableInDouble","https://www.w3.org/2006/vcard/ns#longitude","https://www.w3.org/2001/XMLSchema#float","1" "html-rdfa","http://data.dws.informatik.uni-mannheim.de/structureddata/2021-12/quads/dpef.html-rdfa.nq-00000.gz","UnpreciseRepresentableInDouble","https://www.w3.org/2006/vcard/ns#latitude","https://www.w3.org/2001/XMLSchema#float","1" "html-rdfa","http://data.dws.informatik.uni-mannheim.de/structureddata/2021-12/quads/dpef.html-rdfa.nq-00000.gz","UnpreciseRepresentableInDouble","https://purl.org/goodrelations/v1#hasCurrencyValue","https://www.w3.org/2001/XMLSchema#float","6" … "html-embedded-jsonld","http://data.dws.informatik.uni-mannheim.de/structureddata/2021-12/quads/dpef.html-embedded-jsonld.nq-06239.gz","ValidZeroOrOneNotation","http://schema.org/ratingValue","http://www.w3.org/2001/XMLSchema#integer","96" "html-embedded-jsonld","http://data.dws.informatik.uni-mannheim.de/structureddata/2021-12/quads/dpef.html-embedded-jsonld.nq-06239.gz","ValidZeroOrOneNotation","http://schema.org/minValue","http://www.w3.org/2001/XMLSchema#integer","164" "html-embedded-jsonld","http://data.dws.informatik.uni-mannheim.de/structureddata/2021-12/quads/dpef.html-embedded-jsonld.nq-06239.gz","ValidZeroOrOneNotation","http://schema.org/width","http://www.w3.org/2001/XMLSchema#integer","361" Note: The data contain malformed IRIs, like "xsd:dateTime" (instead of probably "http://www.w3.org/2001/XMLSchema#dateTime"), which are caused by missing namespace definitions in the original source website. Reproduce To reproduce this dataset checkout the RDF Property and Datatype Usage Scanner v2.1.0 and execute: mvn clean package java -jar target/Scanner.jar --category html-rdfa --list http://webdatacommons.org/structureddata/2021-12/files/html-rdfa.list October2021 java -jar target/Scanner.jar --category html-embedded-jsonld --list http://webdatacommons.org/structureddata/2021-12/files/html-embedded-jsonld.list October2021 ./measure.sh October2021 # Wait until the scan has completed. This will take a few days java -jar target/Scanner.jar --results ./October2021/measurements.csv.gz October2021

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 2
  • 2
    views
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
0
Average
Average
Average
2