Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Economics and Cultur...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Economics and Culture
Article . 2025 . Peer-reviewed
License: CC BY NC ND
Data sources: Crossref
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Economics and Culture
Article . 2025
Data sources: DOAJ
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

The Technological Bridge: R Programming’s Utility in Converting Social Media Data for Quantitative Financial Analysis

Authors: Alexey Litvinenko; Saarinen Samuli; Anna Litvinenko;

The Technological Bridge: R Programming’s Utility in Converting Social Media Data for Quantitative Financial Analysis

Abstract

Abstract Research purpose. This study explores whether R programming can transform unstructured qualitative social media data into a quantitative format suitable for econometric modelling. It specifically examines how elements such as text, emojis, and sentiment from Reddit and X (formerly Twitter) can be converted into variables for regression analysis. With the aim to enhance the predictive power of traditional financial models using alternative data sources, the paper outlines comprehensive guidelines with specific technical steps, from scripting an API to extracting data from Reddit and X, through cleaning and tokenising to incorporating the data into regression models using R programming. The study addresses the growing need in financial economics to incorporate alternative data streams by offering a structured, replicable process for transforming high-volume, unstructured online content into statistically valid variables, thereby bridging the gap between qualitative market sentiment and quantitative modelling. Design / Methodology / Approach. Focusing on the methodology and R scripts, this research adopts a quantitative approach, transforming qualitative social media data into a format suitable for multiple linear and instrumental variable regression models to assess the effect of social media signals on asset prices, with GameStop (GME) and Best Buy (BBY) as case studies. The process ensures reproducibility and includes open-source code, enhancing transparency and applicability for both academic and professional financial data analysis contexts. Findings. The findings demonstrate that qualitative social media data can be quantified for financial analysis. It was effectively extracted, cleaned, and used for regression analysis. Results show that traditional market indicators fail to explain GME’s price shifts, while the frequency of rocket emojis (interpreted as speculative sentiment) was statistically significant. BBY’s returns, however, aligned more closely with market and industry indices, suggesting a lower influence of private sentiment. Originality / Value / Practical implications. The research provides a replicable method for integrating social media data into econometric models, contributing new tools for analysing market sentiment. By adapting classical financial models to modern data sources, the paper opens new directions for asset pricing research. The paper provides technical tools created in R for use in econometric analysis, useful both for academics and practitioners.

Keywords

r programming, Economics as a science, econometric analysis, HF5001-6182, capm, g14, c58, Business, price non-synchronization, c87, social media data, HB71-74

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Published in a Diamond OA journal