Actions
  • shareshare
  • link
  • cite
  • add
add
auto_awesome_motion View all 3 versions
Research software . Software . 2019

Buddhist Sanskrit Segmenter

Lugli, Ligeia;
Open Access
English
Abstract

This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as data necessary to use and evaluate the Segmenter and explanatory materials. The segmenter has been tested on 639 sentences from 13 Buddhist text (9 sūtras, 4 śāstra) and has been evaluated as achieving 97% accuracy. The code and materials contained in this folder have been developed as part of a Newton International Fellowship at King's College London, funded by the British Academy (NF161436) Contents R code for segmentation, lemmatisation, normalization and evaluation (includes instructions to run code) powerpoint presentation with background and explanation of project Wordlists and Wordlists documentation ngrams and stems frequency tables necessary for segmentation gold standard set of manually segmented and stemmed sentences for evaluation set of raw sentences for evaluation evaluation of Krisha et al. seq2seq segmenter on Buddhist sentences for reference purposes This segmenter has been used to prepare the Sanskrit Corpus at DOI 10.5281/zenodo.3457822 and its later version at 10.5281/zenodo.3526035

Subjects

Buddhist Sanskrit, Natural Language Processing

moresidebar