Data-Driven Part-of-Speech Tagging of Kiswahili

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Conference object , Article 01 Jan 2006 Kenya, Belgium Publisher:Springer Berlin Heidelberg

Authors: Guy De Pauw; Gilles-Maurice de Schryver; Peter Waiganjo Wagacha;

doi: 10.1007/11846406_25

handle: 10067/613820151162165141 , 1854/LU-351319 , 11295/37322

Data-Driven Part-of-Speech Tagging of Kiswahili

- Summary
- Subjects
- Metrics

Abstract

In this paper we present experiments with data-driven part-of-speech taggers trained and evaluated on the annotated Helsinki Corpus of Swahili Using four of the current state-of-the-art data-driven taggers, TnT, MBT, SVMTool and MXPOST, we observe the latter as being the most accurate tagger for the Kiswahili dataset.We further improve on the performance of the individual taggers by combining them into a committee of taggers We observe that the more naive combination methods, like the novel plural voting approach, outperform more elaborate schemes like cascaded classifiers and weighted voting This paper is the first publication to present experiments on data-driven part-of-speech tagging for Kiswahili and Bantu languages in general.

Countries

Kenya, Belgium

Related Organizations

Ghent University
Belgium
University of Nairobi
Kenya
University of the Western Cape
South Africa
University of Antwerp
Belgium

Keywords

Languages and Literatures

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

4

Average

Green

Fields of Science (4) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all