
handle: 10609/151968
La expansión de la información en internet[1][2][3] ha facilitado la difusión de noticias falsas, generando riesgos críticos para la confianza pública, la percepción institucional y la estabilidad social. Este proyecto se ha centrado en analizar diversas técnicas de Procesamiento de Lenguaje Natural (NLP) para desarrollar un modelo fiable que detecte patrones asociados a la desinformación. Se evaluaron métodos como TF-IDF, Bag of Words (BoW), Word2Vec y el modelo de transformadores DistilBERT, junto con clasificadores como Regresión Logística, Support Vector Machines y Random Forest. También se implementaron técnicas avanzadas como reducción de dimensionalidad con PCA y validación cruzada para mejorar la eficiencia y la robustez de los modelos. Sin embargo, los resultados demostraron que entrenar un modelo solo con datos del cuerpo de las noticias no es suficiente para lograr predicciones fiables. Es necesario capturar el contexto de las noticias, como los comentarios, análisis de sentimiento, fuente, y similitudes con otras noticias. Esto resalta que obtener resultados precisos es más complejo que emplear únicamente modelos avanzados para analizar textos aislados. Este trabajo contribuye al estado del arte al comparar exhaustivamente diversas técnicas de NLP y clasificaciones, ofreciendo un análisis crítico sobre su viabilidad en la detección de noticias falsas en contextos reales.
The rapid expansion of information on the internet[1][2][3] has facilitated the spread of fake news, posing critical risks to public trust, institutional perception, and social stability. This project focused on analyzing various Natural Language Processing (NLP) techniques to develop a reliable model capable of detecting patterns associated with misinformation. Methods such as TF-IDF, Bag of Words (BoW), Word2Vec, and the transformer-based model DistilBERT were evaluated alongside classifiers like Logistic Regression, Support Vector Machines, and Random Forest. Advanced techniques, including dimensionality reduction with PCA and crossvalidation, were implemented to enhance model efficiency and robustness. However, results demonstrated that training a model solely with the text of news articles is insufficient for reliable predictions. Capturing the broader context—such as comments, sentiment analysis, sources, and related articles—is essential. This highlights the complexity of achieving accurate results, even with state-of-the-art text analysis models. This work contributes to the field by critically comparing various NLP techniques and classifiers, offering insights into their viability for detecting fake news in real-world contexts.
fake news, Intel·ligència artificial -- TFG, Artificial intelligence -- TFG, NLP, ML
fake news, Intel·ligència artificial -- TFG, Artificial intelligence -- TFG, NLP, ML
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
