Downloads provided by UsageCounts
We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.
10 pages, 5 figures
Big Data, FOS: Computer and information sciences, Physics - Physics and Society, Blogging, Dialectology, Science, microblogging, FOS: Physical sciences, Machine Learning (stat.ML), Physics and Society (physics.soc-ph), Machine Learning, Statistics - Machine Learning, Humans, [PHYS.PHYS] Physics [physics]/Physics [physics], Language, Social and Information Networks (cs.SI), language, Computer Science - Computation and Language, datasets, Q, R, Linguistics, Computer Science - Social and Information Networks, Medicine, Crowdsourcing, variation, Social Media, Computation and Language (cs.CL), Research Article
Big Data, FOS: Computer and information sciences, Physics - Physics and Society, Blogging, Dialectology, Science, microblogging, FOS: Physical sciences, Machine Learning (stat.ML), Physics and Society (physics.soc-ph), Machine Learning, Statistics - Machine Learning, Humans, [PHYS.PHYS] Physics [physics]/Physics [physics], Language, Social and Information Networks (cs.SI), language, Computer Science - Computation and Language, datasets, Q, R, Linguistics, Computer Science - Social and Information Networks, Medicine, Crowdsourcing, variation, Social Media, Computation and Language (cs.CL), Research Article
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 60 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
| views | 38 | |
| downloads | 71 |

Views provided by UsageCounts
Downloads provided by UsageCounts