MAPPING THE DIGITAL SCIENTIFIC DEBATE ON AI: DISCIPLINARY NARRATIVES, PLATFORM DYNAMICS, AND THE ROLE OF MEDIA AND COMMUNICATION

This dataset accompanies the article Mapping the Digital Scientific Debate on AI: Disciplinary Narratives, Platform Dynamics, and the Role of Media and Communication. It contains the final analytical sample of 6,215 social media posts discussing artificial intelligence in relation to scientific research, drawn from an initial corpus of 9,844 posts published during the first half of 2025 across Instagram, X, TikTok, LinkedIn, and Bluesky. The dataset was designed to support transparent and reusable research on how AI is publicly debated as a scientific tool across disciplines and platforms. To maximize privacy protection, it does not include raw post text, usernames, profile data, URLs, or other direct identifiers. Instead, it only contains LLM-inferred and derived analytical fields, following a strict principle of irreversible anonymization and data minimization. This makes the dataset suitable for secondary analysis of discursive patterns while substantially reducing re-identification risks. The annotation workflow combined computational content analysis with Large Language Models. First, posts were classified into OECD/FORD research fields, with posts not clearly related to academic research excluded from the final released dataset. Second, the valid research-related posts were coded through a closed codebook for dimensions such as disciplinary focus, content type, general topic, AI sentiment, AI stance, risks, opportunities, audience, framing, and synthetic discourse indicators. The final dataset therefore captures structured interpretive variables rather than original social media content. This resource is intended for researchers interested in science communication, platform studies, AI discourse, computational social science, and the public understanding of science. Because the released file only contains inferred variables, it is especially useful for reproducible quantitative analyses of narrative patterns, disciplinary differences, framing strategies, and platform-specific dynamics without redistributing identifiable platform content. FORD field legend 0 = None / Not clearly research 1 = Natural sciences 2 = Engineering and technology 3 = Medical and health sciences 4 = Agricultural and veterinary sciences 5 = Social sciences 6 = Humanities and the arts

Related Organizations

University of the Basque Country
Spain
Tula University
Russian Federation

Keywords

Artificial Intelligence; Science Communication; Social Media; Computational Content Analysis; Media and Communication; Epistemic Cultures; Research Evaluation; Platform Studies; LLM-assisted Classification; FORD Classification.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average