Sentiment Analysis of COVID-19 Scientific Publication Dissemination on Social Media X: A Dataset Analyzed with ChatGPT 3.5 and Gemini 1.5 Flash

"Análise de Sentimento da Divulgação de Publicação Científica sobre COVID-19 na Rede Social X: Um Conjunto de Dados Analisado com ChatGPT 3.5 e Gemini 1.5 Flash

Research datakeyboard_double_arrow_right Dataset 24 Feb 2025Publisher:Zenodo

Authors: Pontes, Danielle; Maricato, João de Melo;

doi: 10.5281/zenodo.14919673 , 10.5281/zenodo.14919674

Sentiment Analysis of COVID-19 Scientific Publication Dissemination on Social Media X: A Dataset Analyzed with ChatGPT 3.5 and Gemini 1.5 Flash

- Summary
- Metrics

Abstract

The dataset provided includes a sample of posts on X that mentioned the editorial published in the journal “Dying in a Leadership Vacuum” on October 7, 2020, with the title “Dying in a Leadership Vacuum” (DOI: 10.1056/NEJMe2029812). A sample of posts on X that referenced the publication was collected. The posts were extracted from the Altmetric platform using a Python 3.12 algorithm with the Beautiful Soup 4.12 library and the Google Colab development environment. As a result, a dataset was generated containing 9,792 posts on X that specifically commented on the aforementioned editorial. Among these posts, 5,601 unique profiles were identified and cross-referenced with the profiles classified and made available in the dataset created by Pontes and Maricato (2023a). From the accounts that had an existing classification (bot or human), 41 accounts that had made more than four posts were selected. According to the dataset provided by Pontes and Maricato (2023), 10 accounts were classified as bots by Botometer, while 31 were classified as human. Considering that Pontes and Maricato (2023) highlighted the limitations of using Botometer for classifying accounts in the altmetric attention network, a manual classification of the 41 selected accounts was conducted. The manual classification was based on criteria such as the number of posts, posting times, time intervals between posts, account creation dates, and profile pictures. Through this manual classification, it was determined that 20 accounts were bots and 21 were human. The classified accounts posted a total of 3,493 posts, which are included in this dataset and were used in the analyses presented in the article. The metadata structure of the dataset is presented below. Variable Name: ACCOUNT Data Type: String (Text) Description: Anonymized account code to preserve user identity. Possible Values: ACCOUNT + sequential number Variable Name: ACCOUNT CLASS (BTM) Data Type: Categorical Description: Automatic account classification using a tool like Botometer. Possible Values: human, bot Variable Name: ACCOUNT CLASS (MANUAL) Data Type: Categorical Description: Manual account classification based on researcher analysis. Possible Values: human, bot Variable Name: POST CONTENT Data Type: Text (String) Description: Full text of the collected post. Variable Name: SENTIMENT CLASS (GPT) Data Type: Categorical Description: Sentiment classification assigned by ChatGPT. Possible Values: positive, neutral, negative Variable Name: SENTIMENT CLASS (GEMINI) Data Type: Categorical Description: Sentiment classification assigned by Gemini. Possible Values: positive, neutral, negative Variable Name: GPT X GEM (MATCH/DIFFERENCE) Data Type: Binary Description: Indicates whether the sentiment classification was the same or different between ChatGPT and Gemini. Possible Values: match, different Variable Name: POST CLASS (MANUAL) Data Type: Categorical Description: Manual sentiment classification of the post. Possible Values: positive, neutral, negative Variable Name: GPT RESULT Data Type: Categorical Description: Evaluation of ChatGPT's classification against the manual standard. Possible Values: correct, incorrect Variable Name: GEMINI RESULT Data Type: Categorical Description: Evaluation of Gemini's classification against the manual standard. Possible Values: correct, incorrect

Related Organizations

University of the State of Amazonas
Brazil
University of Brasília
Brazil

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

Corona Virus Disease