Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Tibetan for Spacy 1.1

Authors: Engels, James; Erhard, Franz Xaver; Barnett, Robert; Hill, Nathan W.;

Tibetan for Spacy 1.1

Abstract

Tibetan for SpaCy is a language model for Tibetan designed for use in the SpaCy environment. The model was trained using SpaCy. It uses an external tokenizer, Botok, to segment the Tibetan and replaces the Tibetan syllable-separator (tseg) with white spaces where it occurs as a word separator. SpaCy was then told to interpret the input as English. This produces good results with standard vocabulary, but fails with unrecognised words. The project is currently working on a more sophisticated version of Tibetan for SpaCy. The package includes a list of stop words. Tibetan for SpaCy was developed by James Engels as part of Divergent Discourses, a joint project between SOAS University of London and Leipzig University, funded by the AHRC in the UK and the DFG in Germany. The project developed Tibetan for SpaCy particularly for users who want to use Tibetan texts within the Leipzig Corpus Miner (iLCM), an advanced text-mining interface designed for social scientists. The instructions for using Tibetan for SpaCy within the iLCM are in the readme file below. These instructions assume the user has already downloaded and installed the iLCM, which can be found here. Please acknowledge the Divergent Discourses project if using this material (note that ultimately copyright belongs to the two participating universities). Contact: rb75@soas.ac.uk 

Keywords

iLCM, SpaCy, NLP for Tibetan, Tokeniser, Botok, Tibetan

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average