Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ JISA (Jurnal Informa...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
JISA (Jurnal Informatika dan Sains)
Article . 2024 . Peer-reviewed
License: CC BY SA
Data sources: Crossref
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Automating the Extraction of Words and Topics in Indonesian Using the Term Frequency-Inverse Document Frequency Algorithm and Latent Dirichlet Allocation

Authors: Lalu Mutawalli; Mohammad Taufan Asri Zaen; Muhammad Fauzi Zulkarnaen;

Automating the Extraction of Words and Topics in Indonesian Using the Term Frequency-Inverse Document Frequency Algorithm and Latent Dirichlet Allocation

Abstract

Keyword extraction and topic modeling in the analysis of Gojek user reviews in Indonesian are very important. By understanding user preferences and needs through keyword extraction, as well as grouping user reviews into different topics through topic modeling, stakeholders can use the information to further improve services. This research uses TF-IDF and LDA approaches to analyze text data from Gojek user reviews and feedback. The data spans from Nov 5, 2021, to Jan 2, 2024, totaling 225,002 rows. Each row includes username, content, time, and app version. The focus is on content reviews. The average length is 8 words, with a maximum of 104 and a minimum of a few words. The variability indicates a non-normal distribution. Preprocessing is conducted to maintain topic analysis accuracy. The TF-IDF method is used to extract relevant keywords, while the LDA approach is used to model the topics in user reviews. The topic analysis reveals patterns in Gojek user reviews. The first topic discusses experience, services, and affordable pricing. The second emphasizes app usability and benefits. The third relates to promos, discounts, and vouchers. The fourth reflects positive evaluations of service quality. However, the fifth topic highlights high costs and app issues. The sixth underscores overall user satisfaction and service convenience. Testing on the topic model yielded a coherence level of 0.509, indicating that the model's topics demonstrate a good level of consistency in finding relevant topics from Gojek user review data. The use of a combination of TF-IDF and LDA in Indonesian text analysis, particularly in the context of Gojek user reviews, is an important step in enhancing understanding and utilization of text data to improve overall user experience. 

Keywords

QA76.75-76.765, tf-idf, lda, topic modeling, Information technology, Computer software, word extraction, T58.5-58.64, preferences

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
gold