Comparative analysis of TF-IDF and loglikelihood method for keywords extraction of twitter data

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Jan 2023Publisher:Mehran University of Engineering and TechnologyJournal:Mehran University Research Journal of Engineering and Technology, volume 42, page 88 (issn: 0254-7821, eissn: 2413-7219,

Copyright policy )

Authors: Muhammad Adeel Abid; Muhammad Faheem Mushtaq; Urooj Akram; Mateen Ahmed Abbasi; Furqan Rustam;

doi: 10.22581/muet1982.2301.09 , 10.60692/xapbn-nhb43 , 10.60692/r32mb-f3t80

Comparative analysis of TF-IDF and loglikelihood method for keywords extraction of twitter data

- Summary
- Subjects
- Metrics

Abstract

Twitter has become the foremost standard of social media in today’s world. Over 335 million users are online monthly, and near about 80% are accessing it through their mobiles. Further, Twitter is now supporting 35+ which enhance its usage too much. It facilitates people having different languages. Near about 21% of the total users are from US and 79% of total users are outside of US. A tweet is restricted to a hundred and forty characters; hence it contains such information which is more concise and much valuable. Due to its usage, it is estimated that five hundred million tweets are sent per day by different categories of people including teacher, students, celebrities, officers, musician, etc. So, there is a huge amount of data that is increasing on a daily basis that need to be categorized. The important key feature is to find the keywords in the huge data that is helpful for identifying a twitter for classification. For this purpose, Term Frequency-Inverse Document Frequency (TF-IDF) and Loglikelihood methods are chosen for keywords extracted from the music field and perform a comparative analysis on both results. In the end, relevance is performed from 5 users so that finally we can take a decision to make assumption on the basis of experiments that which method is best. This analysis is much valuable because it gives a more accurate estimation which method’s results are more reliable.

Related Organizations

Islamia University of Bahawalpur
Pakistan
Khwaja Fareed University of Engineering and Information Technology
Pakistan

Keywords

Technology, Science, FOS: Political science, FOS: Law, Quantum mechanics, Term (time), tf–idf, Data science, Social media, Artificial Intelligence, Computer security, Field (mathematics), Multi-label Text Classification in Machine Learning, FOS: Mathematics, Information retrieval, Key (lock), Political science, T, Physics, Q, Pure mathematics, Engineering (General). Civil engineering (General), Computer science, Automatic Keyword Extraction from Textual Data, World Wide Web, Computer Science, Physical Sciences, Relevance (law), TA1-2040, Textual Data, Law, Mathematics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	13
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%