Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: Datacite
versions View all 7 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

segmented Sanskrit corpus (proof of concept)

Authors: Lugli, Ligeia;

segmented Sanskrit corpus (proof of concept)

Abstract

This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. It comprises: 131 metadata-enriched Buddhist Sanskrit texts for a total of ~ 4 million words (~ 8 million tokens) a ~ 2 million words reference corpus comprising 30 metadata-enriched non-Buddhist Sanskrit texts. The corpus is in romanised Sanskrit (UTF-8 encoding) and is available in three configurations: segmented (with dash-separated words) segmented and stemmed (with capitalised word stem and compounds separated by an @ sign). segmented, stemmed and normalised (normalisation treats some spelling variation and solves sandhi of stems' initials in most cases), recommended for Word Sketches. The latter version can be used to generate word sketches in Sketch Engine if used in conjunction with the included sketch grammar, which infers likely syntactic dependencies from morphological cues. **avagraha has been replaced with a** Limitations As a proof of concept, this corpus suffers from several limitations. It is very small by contemporary standards, it has not been proof-read and it is currently only segmented and stemmed (not lemmatised or PoS tagged). A funding bid has been submitted to expand and lemmatise the corpus. Data Quality The corpus has been segmented with Lugli's Sanskrit segmenter (10.5281/zenodo.3459215). The accuracy of this segmenter has been evaluated at 97% on a sample of Buddhist Sanskrit literature. No evaluation has been performed on non-Buddhist materials and the quality of the segmentation may be worse in the non-Buddhist section of the corpus. Please refer to the segmenter documentation stored at 10.5281/zenodo.3459215 for details on the stemming conventions used in the corpus. Acknowledgments The corpus has been realised as part of the project 'Lexis and Tradition: variation in the vocabulary of Sanskrit Mahāyāna literature'. This project was funded by the British Academy through a Newton International Fellowship (NF161436) and hosted at the Department of Theology and Religious Studies at King's College London under the supervision of Prof. Henrietta Kate Crosby. Dr. Bruno Galasek-Hul has contributed to version 1.4 thanks to funding from the Mangalam Research Center for Buddhist Languages. Thanks to GRETIL, Dr. Vinita Tseng and Prof. Steinkellner for kindly giving their permission to include automatically processed versions of some of their editions in this corpus. Changelog version 1.4 adds 59 Buddhist texts and fixes some recurrent segmentation errors version 1.4.1 corrects some spacing and sentence parsing errors

Also included: bibliography cum metadata summary and a sketch grammar + corpus configuration file for use in Sketch Engine

Related Organizations
Keywords

Sanskrit, Buddhist Sanskrit, corpus

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 10
    download downloads 9
  • 10
    views
    9
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
10
9