
This repository contains a sample data set that demonstrates the use of web pages as a data source for visual-aware machine learning applications using the FitLayout framework. The dataset captures the rendered pages from the imaginary bookstore available at https://books.toscrape.com/. For each book page in the book store, the data set contains two FitLayout artifacts: A Page that directly describes the rendered page at the box level. An AreaTree that provides abstraction over the rendered page in the form of a tree of visual areas, where significant areas (e.g., book title and price) are annotated with the corresponding tags. The artifacts have been exported from the FitLayout RDF repository in the N-QUADS format, which allows easy importing them to another repository. Contained files book_urls.txt -- the source URLs of the rendered pages. books_artifacts.zip -- the RDF graph describing all the artifacts serialized in N-QUADS format.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
