Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY NC
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY NC
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY NC
Data sources: ZENODO
versions View all 3 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

The BigGrams: the semi-supervised information extraction system from HTML: an improvement in the wrapper induction - dataset

Authors: Mirończuk Marcin;

The BigGrams: the semi-supervised information extraction system from HTML: an improvement in the wrapper induction - dataset

Abstract

The zip file contains two folders. The "websites" folder includes webpages from websites, like a agatameble.pl (e-shop website), filmweb.pl (website about films), and ptaki.info (website about birds). The "reference-seeds" folder contains three subfolders, i.e. agatameble.pl, filmweb.pl, and ptaki.info. Each subfolder contains reference-seeds.csv file. The file contains data, i.e. reference instances - carefully labeled ground-truth of corresponding values in each web page of given webistes mentioned above.

Keywords

reference instances, website information extraction, information extraction

Powered by OpenAIRE graph
Found an issue? Give us feedback
Related to Research communities