Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users

Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users This model is used to analyze the Twitter users and assigns a score calculated based on their social profiles, the credibility of his tweets, the h-indexing score of the tweets. Users with a higher score are not only considered as more influential but also their tweets are considered to have greater credibility. The model is based on both the user level and content level features of a Twitter user. The details for feature extraction and calculating the Influence score is given in the paper. Description To extract the features from Twitter and generate the dataset we used Python. A modAL framework is used to randomly selects ambiguous data points from the unlabeled data pool using three different sampling techniques and the human manually annotates the selected data. We generate a dataset for 50000 Twitter users and then used different classifiers to classify the Twitter user either as Trusted or Untrusted. Organization The project consists of the following files: Dataset.csv The dataset consists of different features of 50000 Twitter users (Politicians) without labels. Manually_labeled-Dataset.csv This CSV file contains all those Twitter users classified manually as Trusted or Untrusted feature_extraction.py This python script is used to calculate the Influence score of a Twitter user and further used to generate a dataset. The Influence score is based on: - Social reputation of the user - Content score of the tweets - Tweets credibility - Index score for the number of re-tweets and likes Activelearner.ipynb To classify a large pool of unlabeled data, we used an active learning model (ModAL Framework). A semi-supervised learning algorithm ideal for a situation in which the unlabeled data is abundant but manual labeling is expensive. The active learner randomly selects ambiguous data points from the unlabeled data pool using three different sampling techniques and the human manually annotates the selected data. Further, we use four different classifiers (Support Vector Machine, Logistic Regression, Multilayer Perceptron and Random Forest) to classify the Twitter user as either Trusted Or Untrusted. twitter_reputation.ipynb We used different regression models to test its performance on our generated dataset (It is only for testing, now no more part of our work). We train and evaluate our models using different regression models. Training and testing three regression models: 1. Multilayer perceptron 2. Deep neural network 3. Linear regression twitter_credentials.py In order to extract the features of Twitter users first, one need to authenticate by providing the credentials given in this file. Screen names (Screen_name_1.txt, Screen_name_2.txt, Screen_name_3.txt) These text files consist of all the Twitter user screen_names. All of them are politicians. We remove the names of all those politicians whose accounts are private. In addition, all those politicians who have no followers/followings are not on the list are also removed. The text of the tweets are not saved. Furthermore, we also remove duplicate names. References [1] https://stackoverflow.com/questions/38881314/twitter-data-to-csv-getting-error-when-trying-to-add-to-csv-file [2] https://stackoverflow.com/questions/48157259/python-tweepy-api-user-timeline-for-list-of-multiple-users-error [3] https://gallery.azure.ai/Notebook/Computing-Influence-Score-for-Twitter-Users-1 [4] https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html [5] https://towardsdatascience.com/deep-neural-networks-for-regression-problems-81321897ca33

Keywords

Twitter, Influence Score, Classification

EOSC Subjects

Twitter Data

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	53
download	downloads	89

53
views
89
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

53

89