Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Repositorio Document...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Repositorio Documental UMNG
Bachelor thesis . 2018
License: CC BY NC ND
versions View all 1 versions
addClaim

Aplicación de la metodología CRISP-DM a la recolección y análisis de datos georreferenciados desde Twitter

Authors: García Vélez, Gustavo Adolfo;

Aplicación de la metodología CRISP-DM a la recolección y análisis de datos georreferenciados desde Twitter

Abstract

La minería de datos es actualmente una de las áreas con mayor auge y éxito dentro de la informática, al permitir encontrar correlaciones y patrones a partir del análisis de grandes volúmenes de datos. En este sentido, la Geomática es fundamental si los datos se encuentran georreferenciados, aportando el componente espacial del análisis. Una metodología ampliamente utilizada en el desarrollo de proyectos de minería de datos es la denominada CRISP-DM®, compuesta de seis etapas (comprensión del negocio, comprensión de los datos, preparación de los datos, modelado, evaluación e implementación), la cual se emplea para el análisis de información georreferenciada proveniente de la red social Twitter®, con el fin de hallar patrones que permitan responder preguntas como: ¿en dónde se generan más trinos geolocalizados en la ciudad de Bogotá? ¿Cuáles son los sectores catastrales de Bogotá en donde sería más probable encontrar un tweet georreferenciado? Data mining is currently one of the most successful areas in informatics since it allows finding correlations and patterns from analysis of big data. In that sense, Geomatics is fundamental as long as data is georeferenced, by giving the spatial component of the analysis. CRISP-DM® is a widely-used methodology in data mining with six basic steps: business understanding, data understanding, data preparation, modeling, evaluation and deployment, which will be used for analysis of geolocated data from Twitter® social network in order to find patterns and answer questions such as: In what places of Bogotá more georreferenced tweets are generated? In which cadastral sectors is likely to find a located tweet? Especialización

Country
Colombia
Related Organizations
Keywords

Twitter, Hot Spot Analysis, Densidad Kernel, Minería de datos, CRISP-DM, Kernel Density, PostgreSQL, SISTEMAS DE RECOLECCION AUTOMATICA DE DATOS, API, Análisis de puntos calientes, MINERIA DE DATOS, Data mining, Python

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green