Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset
Data sources: ZENODO
addClaim

Indonesian Electric Vehicle (EV) Public Sentiment Dataset from YouTube Comments (2024–2026) and Indonesian Electric Vehicle (EV) Slang Normalization

Authors: Navaly; Dienengsari, Zahra Desvira; Deyis, Muhammad Thariq Sultan;

Indonesian Electric Vehicle (EV) Public Sentiment Dataset from YouTube Comments (2024–2026) and Indonesian Electric Vehicle (EV) Slang Normalization

Abstract

1. EV YouTube Sentiment Dataset (CSV) This dataset contains a curated collection of 3,549 public comments extracted from Indonesian-language YouTube videos discussing Electric Vehicles (EVs). The data was collected using the YouTube Data API v3, targeting videos published between 2021 and 2026. To capture the most recent shifting trends in public discourse regarding EV adoption, pricing, battery reliability, and infrastructure in Indonesia, only comments posted between 2024 and 2026 were preserved. The dataset underwent preprocessing, including the removal of URLs, emojis, duplicate entries, and irrelevant spam to ensure structural textual consistency. A selective normalization strategy was applied where domain specific EV terms were standardized (utilizing dictionary available at https://doi.org/10.5281/zenodo.20567243), while general social media slang expressions were intentionally retained to preserve linguistic nuances for natural language processing (NLP) model evaluations. Sentiment labeling was executed via an automated GPT-based approach (using the GPT 5.4 mini model) to classify the comments into three polarity categories: positive, negative, and neutral. Each record includes a model-generated confidence score to provide transparency regarding the annotation quality. Only comments with confidence scores ≥ 0.75 were automatically retained, while lower confidence predictions were manually reviewed and annotated. Dataset Structure & SchemaThe dataset is provided in a tabular format (CSV) containing the following 6 columns: 1. date: The timestamp indicating when the user posted the comment (Filtered for the 2024–2026 period).2. author_display_name: The masked or public display name of the YouTube user who authored the comment.3. text_display: The preprocessed and selectively normalized text of the comment.4. like_count: The total number of likes received by the comment, serving as a metric for community agreement or engagement.5. label: The predicted sentiment category assigned by the language model (positive, negative, or neutral).6. confidence: A float value indicating the model's confidence level in its sentiment prediction. 2. Indonesian EV Slang Normalization Dictionary (CSV) This file contains a normalization dictionary of Indonesian slang words, abbreviations, misspellings, and informal expressions frequently found in EV-related discussions on Indonesian social media. The dictionary focuses on terms associated with EV adoption, vehicle models, battery technology, charging infrastructure, pricing, and general automotive topics. The dictionary is intended to support reproducible preprocessing workflows and may be reused in future Indonesian NLP research involving EV-related text.

Powered by OpenAIRE graph
Found an issue? Give us feedback