Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Recolector de Cienci...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Recolector de Ciencia Abierta, RECOLECTA
Bachelor thesis . 2025
License: CC BY NC ND
addClaim

Detección de noticias falsas

Authors: Rey Morales, José Antonio;

Detección de noticias falsas

Abstract

La expansión de la información en internet[1][2][3] ha facilitado la difusión de noticias falsas, generando riesgos críticos para la confianza pública, la percepción institucional y la estabilidad social. Este proyecto se ha centrado en analizar diversas técnicas de Procesamiento de Lenguaje Natural (NLP) para desarrollar un modelo fiable que detecte patrones asociados a la desinformación. Se evaluaron métodos como TF-IDF, Bag of Words (BoW), Word2Vec y el modelo de transformadores DistilBERT, junto con clasificadores como Regresión Logística, Support Vector Machines y Random Forest. También se implementaron técnicas avanzadas como reducción de dimensionalidad con PCA y validación cruzada para mejorar la eficiencia y la robustez de los modelos. Sin embargo, los resultados demostraron que entrenar un modelo solo con datos del cuerpo de las noticias no es suficiente para lograr predicciones fiables. Es necesario capturar el contexto de las noticias, como los comentarios, análisis de sentimiento, fuente, y similitudes con otras noticias. Esto resalta que obtener resultados precisos es más complejo que emplear únicamente modelos avanzados para analizar textos aislados. Este trabajo contribuye al estado del arte al comparar exhaustivamente diversas técnicas de NLP y clasificaciones, ofreciendo un análisis crítico sobre su viabilidad en la detección de noticias falsas en contextos reales.

The rapid expansion of information on the internet[1][2][3] has facilitated the spread of fake news, posing critical risks to public trust, institutional perception, and social stability. This project focused on analyzing various Natural Language Processing (NLP) techniques to develop a reliable model capable of detecting patterns associated with misinformation. Methods such as TF-IDF, Bag of Words (BoW), Word2Vec, and the transformer-based model DistilBERT were evaluated alongside classifiers like Logistic Regression, Support Vector Machines, and Random Forest. Advanced techniques, including dimensionality reduction with PCA and crossvalidation, were implemented to enhance model efficiency and robustness. However, results demonstrated that training a model solely with the text of news articles is insufficient for reliable predictions. Capturing the broader context—such as comments, sentiment analysis, sources, and related articles—is essential. This highlights the complexity of achieving accurate results, even with state-of-the-art text analysis models. This work contributes to the field by critically comparing various NLP techniques and classifiers, offering insights into their viability for detecting fake news in real-world contexts.

Keywords

fake news, Intel·ligència artificial -- TFG, Artificial intelligence -- TFG, NLP, ML

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green