Downloads provided by UsageCounts
This dataset contains two files: original_data.zip, and website_5folds.zip original_data.zip will unpack into three .csv files, Place.csv, CreativeWork.csv, and LocalBusiness.csv. Each file contains one entity on each row, and this entity belongs to a subclass of the class indicated by the file name. There are 8 columns: the first 2 columns are simply the index of the row description_t: the long textual description of the entity schemaorg_class: the schema.org class assigned to the entity name_tpage_domain: always empty name_t: the name of the entity page_domain: the website where the entity mark-up data is found label: an index for the schemaorg_class description: this is the name of the entity (name_t) plus the first sentence of its description (from description_t) website_5folds.zip is a transformation of the original_data.zip. It unzips into three folders, Place, LocalBusiness, and CreativeWork. Inside each folder, there are five folders: 0, 1, 2, 3 and 4 indicating five folds. Inside each of the numbered sub-folder there is a train.csv and test.csv file. Then each csv file contains one entity on each row, with the following columns: the first column is simply the index of the row schemaorg_class: the schema.org class assigned to the entity name_t: the name of the entity description: this is the name of the entity (name_t) plus the first sentence of its description (from description_t) page_domain: the name of the entity plus the processed domain name. The process includes parsing the domain URL, extract the host name, applying word segmentation (tescobank -> tesco bank), and removing stopwords and TLDs (co, uk, com, fr) As mentioned, website_5folds.zip is a transformation of the original_data.zip and in fact contains multiple replications of original_data.zip. It is created for 5 fold validation experiment while ensuring that there are no overlap in the page_domain of entities in training and test sets.
linked data, schema.org, entity classification
linked data, schema.org, entity classification
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 7 | |
| downloads | 6 |

Views provided by UsageCounts
Downloads provided by UsageCounts