Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: ZENODO
addClaim

im2latex 230k

Authors: Maruss, Gregory; Alansari, Nawaf;

im2latex 230k

Abstract

The dataset comprises of over 230,000 LaTeX math formulas and their corresponding .png images. The images vary in size and have a resolution of 72dpi. These formulas were extracted from LaTeX sources, originally from arXiv, and were parsed to create the dataset. The dataset size has been increased from 180,000 to 230,000 in version 3. The dataset was generated using a tool built with JavaScript and Python, which is available on GitHub. For further details, please refer to the following link: https://github.com/gmarus777/Printed-Latex-Data-Generation Formulas were parsed from LaTeX sources provided here: http://www.cs.cornell.edu/projects/kddcup/datasets.html(originally from arXiv). Contents: - folder `generated_png_images` contains PNG images - `corresponding_png_images.txt` each new line contains png images filename for the folder `generated_png_images` - `final_png_formulas.txt` each new line contains a corresponing LaTex formula - `230k.json` contains a vocabulary consisting of 579 tokens. Version 3 updates: -- Dataset size increase to 230k (from 180k)

Related Organizations
Keywords

arxiv, OCR, Latex, equations, image-2-latex, image-to-latex, FOS: Mathematics, im2latex, Tex, formulas, pdf, Mathematics

Powered by OpenAIRE graph
Found an issue? Give us feedback