software . 2020

pdf2blocks

Stéphan, BERNARD; Catherine, ROUSSEY;
Open Access French
  • Published: 06 Oct 2020
  • Publisher: Zenodo
Abstract
This python script converts pdf file written in french into html file. The conversion consists in organizing the textual content of a pdf file into separate blocks. Each of these blocks will be transformed into an html section: H1, H2, P, FigCaption, Footer, Header. <br> This program uses pdftohtml and pdftotext, two tools of the poppler bookstore (https://poppler.freedesktop.org/)<br> <br> It's run from the command line: <pre><code>python pdf2blocks.py /link/to/file.pdf</code></pre> The result is written on standard output. The algorithme is described in french into the README.md file of the archive.
Persistent Identifiers
Subjects
ACM Computing Classification System: ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
free text keywords: python, pdf, html, poppler, text extraction, french
Download from
Zenodo
Software . 2020
Provider: Datacite
Any information missing or wrong?Report an Issue