Downloads provided by UsageCounts
This upload contains IPA json-mappings for all pinyin romanized Chinese syllables retrieved from a big corpus. The corpus that contained the original Chinese words was taken from uni-leipzig. We took the 1M Wikipedia Corpus from 2018. Each syllable was extracted and then converted to pinyin. The pinyin transcription was retrieved with pypinyin (v0.47.1) using dict-from-pypinyin (v0.0.1) which was then transcribed to IPA using pinyin-to-ipa (v0.0.1). Only the first possible transcription was included in the mappings. Note: tone sandhi is not considered since the vocabulary consists only of stand-alone syllables. Files: hanzi-vocabulary.txt contains the hanzi vocabulary from which pinyin was transcribed (Chinese syllables), e.g., 㩳 pinyin-ipa-map-NORMAL.json (418 mappings) contains toneless pinyin mapped to IPA in pypinyin-style NORMAL, e.g., beng pinyin-ipa-map-TONE.json (1400 mappings) contains pinyin mapped to IPA with pinyin tones in pypinyin-style TONE, e.g., bèng pinyin-ipa-map-TONE2.json (1400 mappings) contains pinyin mapped to IPA with pinyin tones in pypinyin-style TONE2, e.g., be4ng pinyin-ipa-map-TONE3.json (1400 mappings) contains pinyin mapped to IPA with pinyin tones in pypinyin-style TONE3, e.g., beng4 pinyin-ipa-map-TONE3-all.json (2508 mappings) contains all theoretical combinations of pinyin mapped to IPA with pinyin tones in pypinyin-style TONE3, e.g., beng4 oov-vocabulary.txt contains the vocabulary from which no pinyin could have been transcribed (because it was no Chinese symbol or doesn't have a pinyin representation), e.g., 방 or 㕔 script.sh contains the script to reproduce all results
{"references": ["D. Goldhahn, T. Eckart & U. Quasthoff: Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the 8th International Language Resources and Evaluation (LREC'12), 2012"]}
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
Chinese, Pinyin, Phonetics, Mandarin, IPA, Hanzi
Chinese, Pinyin, Phonetics, Mandarin, IPA, Hanzi
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 1 | |
| downloads | 2 |

Views provided by UsageCounts
Downloads provided by UsageCounts