Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Smithsonian figshare
Dataset . 2021
License: CC BY
versions View all 2 versions
addClaim

1QIsaa data collection (binarized images, feature files, and plotting scripts) for writer identification test using artificial intelligence and image-based pattern recognition techniques

Authors: Popović, Mladen; Dhali, Maruf A.; Schomaker, Lambert;

1QIsaa data collection (binarized images, feature files, and plotting scripts) for writer identification test using artificial intelligence and image-based pattern recognition techniques

Abstract

The Great Isaiah Scroll (1QIsaa) data set for writer identification This data set is collected for the ERC project: The Hands that Wrote the Bible: Digital Palaeography and Scribal Culture of the Dead Sea Scrolls PI: Mladen Popović Grant agreement ID: 640497 Project website: https://cordis.europa.eu/project/id/640497 Copyright (c) University of Groningen, 2021. All rights reserved. Disclaimer and copyright notice for all data contained on this .tar.gz file: 1) permission is hereby granted to use the data for research purposes. It is not allowed to distribute this data for commercial purposes. 2) provider gives no express or implied warranty of any kind, and any implied warranties of merchantability and fitness for purpose are disclaimed. 3) provider shall not be liable for any direct, indirect, special, incidental, or consequential damages arising out of any use of this data. 4) the user should refer to the first public article on this data set: Popović, M., Dhali, M. A., & Schomaker, L. (2020). Artificial intelligence-based writer identification generates new evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the Great Isaiah Scroll (1QIsaa). arXiv preprint arXiv:2010.14476. BibTeX: @article{popovic2020artificial, title={Artificial intelligence based writer identification generates new evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the Great Isaiah Scroll (1QIsaa)}, author={Popovi{\'c}, Mladen and Dhali, Maruf A and Schomaker, Lambert}, journal={arXiv preprint arXiv:2010.14476}, year={2020} } 5) the recipient should refrain from proliferating the data set to third parties external to his/her local research group. Please refer interested researchers to this site for obtaining their own copy. Organisation of the data: The .tar.gz file contains three directories: images, features, and plots. The included 'README' file contains all the instructions. The 'images' directory contains NetPBM images of the columns of 1QIsaa. The NetPBM format is chosen because of its simplicity. Additionally, there is no doubt about lossy compression in the processing chain. There are two images for each of the Great Isaiah Scroll columns: one is the direct binarized output from the BiNet (arxiv.org/abs/1911.07930) system, and the other one is the manually cleaned version of the binarized output. The file names for the direct binarized output are of the format '1QIsaa_col<columnnr>.pbm', for example, '1QIsaa_col15.pbm'. And, for the cleaned version, the format is '1QIsaa_col<columnnr>_cleaned.pbm', for example, '1QIsaa_col15_cleaned.pbm'. Note: the image files are not in a separate directory; they will be extracted in the same place. However, due to the unique naming, there is no problem extracting them in one single directory. The 'features' directory contains feature files computed for each of the column images. There are two types of feature files: Hinge and Adjoined. They are distinguishable by their extension, for example, '1QIsaa_col15_cleaned.hinge' and '1QIsaa_col15_cleaned.adjoined'. They are also arranged in separate directories for ease of use. The 'plots' directory contains a simple python script to perform PCA on the feature files and then visualize them in a 3D plot. The file takes the location of feature files as an input. The 'README_plot' file contains examples of how-to-run in the terminal. Brief description: According to ImageMagick's' identify' tool, the original images are in grayscale (.jpg) from Brill collection, in '8-bit Gray 256c'. These images pass through multiple preprocessing measures to become suitable for pattern recognition-based techniques. The first step in preprocessing is the image-binarization technique. In order to prevent any classification of the text-column images based on irrelevant background patterns, a specific binarization technique (BiNet) was applied, keeping the original ink traces intact. After performing the binarization, the images were cleaned further by removing the adjacent columns that partially appear on the target columns' images. Finally, few minor affine transformations and stretching corrections were performed in a restrictive manner. These corrections are also targeted for aligning the texts where the text lines get twisted due to the leather writing surface's degradation. Hence, the clean images are there in the directory along with the direct binarized images. No effort has been made to obtain a balanced set in any way. Tools: Binarization: The BiNet tool is available for scientific use upon request (m.a.dhal(at)rug.nl) Image Morphing: In the original article, data augmentation was performed using image morphing. The tool is available on GitHub: https://github.com/GrHound/imagemorph.c Features for writer identification: Lambert Schomaker http://www.ai.rug.nl/~lambert/allographic-fraglet-codebooks/allographic-fraglet-codebooks.html http://www.ai.rug.nl/~lambert/hinge/hinge-transform.html 1. L. Schomaker & M. Bulacu (2004). Automatic writer identification using connected-component contours and edge-based features of upper-case Western script. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 26(6), June 2004, pp. 787 - 798. 2. Bulacu, M. & Schomaker, L.R.B. (2007). Text-independent Writer Identification and Verification Using Textural and Allographic Features, IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Special Issue - Biometrics: Progress and Directions, April, 29(4), p. 701-717. The features (hinge, fraglets) have been combined in a single MS Windows application, GIWIS, which is available for scientific use upon request (l.r.b.schomaker(at)rug.nl) If you have any question, please contact us: Maruf A. Dhali <m.a.dhali(at)rug.nl> Lambert Schomaker <l.r.b.schomaker(at)rug.nl> Mladen Popović <m.popovic(at)rug.nl> Please cite our papers if you use this data set: 1. Popović, M., Dhali, M. A., & Schomaker, L. (2020). Artificial intelligence based writer identification generates new evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the Great Isaiah Scroll (1QIsaa). arXiv preprint arXiv:2010.14476. 2. Dhali, M. A., de Wit, J. W., & Schomaker, L. (2019). Binet: Degraded-manuscript binarization in diverse document textures and layouts using deep encoder-decoder networks. arXiv preprint arXiv:1911.07930.

Related Organizations
Keywords

Evolutionary Biology, Writer identification, Chemical Sciences not elsewhere classified, Information Systems not elsewhere classified, Plant Biology, Marine Biology, Great Isaiah Scroll, Infectious Diseases, Sociology, Artificial Intelligence, Pattern recognition, Document analysis, Genetics, Medicine, Historical manuscript dating, Physical Sciences not elsewhere classified, Biotechnology

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 50
    download downloads 3
  • 50
    views
    3
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
50
3