software . 2020


Stéphan, BERNARD; Catherine, ROUSSEY;
Open Access French
  • Published: 06 Oct 2020
  • Publisher: Zenodo
This python script converts pdf file written in french into html file. The conversion consists in organizing the textual content of a pdf file into separate blocks. Each of these blocks will be transformed into an html section: H1, H2, P, FigCaption, Footer, Header. <br> This program uses pdftohtml and pdftotext, two tools of the poppler bookstore (<br> <br> It's run from the command line: <pre><code>python /link/to/file.pdf</code></pre> The result is written on standard output. The algorithme is described in french into the file of the archive.
Persistent Identifiers
ACM Computing Classification System: ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
free text keywords: python, pdf, html, poppler, text extraction, french
Download from
Software . 2020
Provider: Datacite
Any information missing or wrong?Report an Issue