Standardized Project Gutenberg Corpus

Dataset OPEN
Martin Gerlach; Francesc Font-Clos;

<p><strong>Standardized Project Gutenberg Corpus</strong><br> version: SPGC-2018-07-18<br> number of books: 55905<br> uncompressed size: 3GB (counts) +&nbsp;18GB (tokens)</p> <p><strong>Publication</strong><br> <a href="https://arxiv.org/abs/1812.08092">... View more
Share - Bookmark

  • Download from
    Zenodo via Zenodo (Dataset)
  • Cite this research data