Análisis de contenido de texto basado en procesamiento de lenguaje natural con BERT

González, Jairo; Angulo, Jesús; Andrés, Meza

Found an issue? Give us feedback

LAReferencia - Red F...arrow_drop_down

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Article . 2022

Data sources: LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Análisis de contenido de texto basado en procesamiento de lenguaje natural con BERT

descriptionPublicationkeyboard_double_arrow_right Article 06 Dec 2022 Colombia English Publisher:Barranquilla, Universidad del Norte, 2022

Authors: González, Jairo; Angulo, Jesús; Andrés, Meza;

handle: 10584/11216

Análisis de contenido de texto basado en procesamiento de lenguaje natural con BERT

- Summary
- Subjects
- Metrics

Abstract

Este artículo se centra en tratar de aliviar los problemas relacionados con el análisis de contenido. Discutiremos el uso de diferentes modelos de clasificación en el aprendizaje automático. Adoptamos este enfoque para resolver algunos problemas relacionados con el análisis cualitativo, como la fiabilidad en el tiempo y la disminución de la mano de obra cualificada. Lo hacemos para automatizar un proceso que suele requerir cantidades considerables de tiempo y recursos, como humanos capacitados y largos plazos de entrega. Exploramos el uso de diferentes técnicas como Random Forest y K-Nearest Neighbor, también probamos diferentes métodos de bolsa de palabras para codificar el texto. También evaluamos un prototipo de la solución propuesta con Representaciones de Codificación Bidireccional de Transformadores (BERT) bajo un conjunto de datos para la detección de noticias falsas debido a las limitaciones de alcance, sin embargo, es aplicable a otro corpus y otro contexto de texto. Finalmente, con los servicios de AWS implementaremos un sistema para la creación de una API que pueda ser utilizada por el usuario común e implementada en sus sistemas de clasificación.

This article focuses on trying to alleviate problems related to content analysis. We will discuss the use of different models for classification in Machine learning. We take this approach to solve some problems related to qualitative analysis, such as reliability over time and the decline of skilled labor. We do this to automate a process that usually requires considerable amounts of time and resources, such as trained humans and long lead times. We explored the use of different techniques like Random Forest and K-Nearest Neighbor, we also tried different bag of words methods to encode the text. We also evaluated a prototype of the proposed solution with Bidirectional Encoding Representations of Transformers (BERT) under a dataset for detection of fake news due to scope limitations, However, it is applicable to another corpus and other text context. Finally, with AWS services we will implement a system for the creation of an API that can be used by the common user and implemented in their classification systems.

Country

Colombia

Related Organizations

Universidad del Norte
Colombia

Keywords

text classification, clasificación de texto, PLN, NLP, BERT

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green