
Diverge-Gemini POS-tagged Corpus of Modern Tibetan is a modern Tibetan text corpus compiled from a wide range of sources of modern Tibetan, including Tibetan language books and newspapers from the 1950s, 1960s as well as 2000s, published in the Republic of India and the People's Republic of China of automatically Part-of-Speech (POS)-tagged with Google's Gemini Pro 1.5 model via the Google Cloud API using UD tags. Tagging was done using the Divergent Discourses Gemini Pro 1.5 POS-tagger. To avoid arbitrary tokenization the raw data was tokenised with the Modern_Botok dialect pack for Botok v3.13 before Gemini-POS-tagging. The files are in CONLLU format. Diverge-Gemini POS-tagged Modern Tibetan Corpus.zip contains the raw files as returned from Gemini Pro 1.5 Diverge-Gemini POS-tagged Modern Tibetan Corpus Normalised.zip contains a set of cleaned-up and normalised files. The following sources were used: (I) Books: Ma hphung 馬烽. 1954. Gnyen bsgrigs kyi lo rgyus. Pe cin: Mi rigs dpe skrun khang (50kb). Hri hphun 石峯. 1955. Krung go'i mi dmangs rang 'thad dmag gi dmag mi phal ba zhig gi lo rgyus: Mtsho sngon mi dmangs dpe skrun khang (95kb). Le'u hro'o chi 劉少奇. 1950. Le'u hro'o chi'i lnga gcig gtams bshad. Pe cin: Krung dbyang mi dmangs srid gzhung mi rigs don byed u yon lhan khang (93kb). Lin khru'u 林初. 1955. Deng dus kyi the wan. Pe cin: Mi rigs dpe skrun khang (131kb). Ma'o tse tung 毛澤東. 1952. Dmangs gtso'i ring lugs gsar pa'i bstan bcos. Pe cing: Krung dbyang mi dmangs srid gzhung mi rigs don byed u yon lhan khang (383kb). Hu yun 胡芸. 1957. Mes rgyal gyi yul ljongs. pe cin: Mi rigs dpe skrun khang (162kb). Nyi zla skar gsum. 1955. Pe cin: Mi rigs dpe skrun khang (45kb). (II) Newspapers: (1) transcribed by Divergent Discourses bod mi'i rang dbang (India, 13 issues of 1965, 666kb) dar mdo'i gsar 'gyur (PRC, ten issues from 1954-55, 1MB) dkar mdzes nyin re'i gsar 'gyur (PRC, 10 issues from 1959, 672kb) gsar 'gyur mdor bsdus (PRC, 16 issues from the years 1953-1954, 895kb) kan lho'i gsar 'gyur (PRC, 12 issues from 195, 517kb9) min ciang gsar 'gyur (PRC, nine issues from 1953-59, 783kb) mtsho sngon bod yig gsar 'gyur (PRC, 14 issues from the years 1951-1965, 1,2MB) Rang dbang gsar shog (India, seven issues from 1961-1965, 594kb) Rang dbang srung skyob gsar shog (India, five issues from 1963-65, 226kb) yul phyogs so so'i gsr 'gyur me long (India, 12 issues from the years 1950-63, 938kb) (2) scraped from the internet: (2.1) Esukhia Tibetan news corpus from India): Bangchen (12,542 articles, 121MB) BOD Asia (161MB) Gyalwa Rinpoche (575 articles, 9,4MB) Radio Free Asia (26,890 articles, 117MB) Tibet Times (18,090 articles, 218MB) Voice of America (VOA) Tibetan (1,100 articles, 13MB) Voice of Tibet (VOT) (7,452 articles, 68MB) (2.2) from the PRC: bod ljongs nyin re'i tshags par (4,155 articles, 162MB) The corpus was tagged for the Divergent Discourses project led by Franz Xaver Erhard (Leipzig University and Robert Barnett (SOAS, London)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
