Spelling Error Patterns in Brazilian Portuguese

descriptionPublicationkeyboard_double_arrow_right Article 01 Mar 2015 English Publisher:MIT Press - JournalsJournal:Computational Linguistics, volume 41, pages 175-183 (issn: 0891-2017, eissn: 1530-9312,

Copyright policy )

Authors: Priscila A. Gimenes; Norton Trevisan Roman; Ariadne M. B. R. Carvalho;

doi: 10.1162/coli_a_00216

Spelling Error Patterns in Brazilian Portuguese

- Summary
- Subjects
- Related research
  (8)
- Metrics

Abstract

Fifty years after Damerau set up his statistics for the distribution of errors in typed texts, his findings are still used in a range of different languages. Because these statistics were derived from texts in English, the question of whether they actually apply to other languages has been raised. We address this issue through the analysis of a set of typed texts in Brazilian Portuguese, deriving statistics tailored to this language. Results show that diacritical marks play a major role, as indicated by the frequency of mistakes involving them, thereby rendering Damerau's original findings mostly unfit for spelling correction systems, although still holding them useful, should one set aside such marks. Furthermore, a comparison between these results and those published for Spanish show no statistically significant differences between both languages—an indication that the distribution of spelling errors depends on the adopted character set rather than the language itself.

Related Organizations

University of the South Pacific
Fiji
State University of Campinas
Brazil

Keywords

Computational linguistics. Natural language processing, P98-98.5

8 Research products, page 1 of 1

STEMMING BAHASA JAWA MENGGUNAKAN DAMERAU LEVENSHTEIN DISTANCE (DLD)
2021IsAmongTopNSimilarDocuments
Real-Word Spelling Correction with Trigrams: A Reconsideration of the Mays, Damerau, and Mercer Model
2008IsAmongTopNSimilarDocuments
Damerau Levenshtein Distance for Indonesian Spelling Correction
2019IsAmongTopNSimilarDocuments
Paul Damerau. Kaiser Claudius II Gothicus (268-270 n.Chr.)
1935IsAmongTopNSimilarDocuments
Bit-Parallel Approximate String Matching Algorithms with Transposition
2003IsAmongTopNSimilarDocuments
Koreksi Ejaan Query Bahasa Indonesia Menggunakan Algoritme Damerau Levenshtein
2010IsAmongTopNSimilarDocuments
Accelerating Levenshtein and Damerau edit distance algorithms using GPU with unified memory
2017IsAmongTopNSimilarDocuments
Kombinasi Damerau Levenshtein dan Jaro-Winkler Distance Untuk Koreksi Kata Bahasa Inggris
2020IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

4

Average

gold

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering