Downloads provided by UsageCounts
handle: 2117/100796
In recent years more and more people have been connecting with Social Networks. One of the most used is Twitter. This huge amount of information is attracting the interest of companies. One reason is that this huge source of information can be used to detect public opinion about their brands and thus improve their business values. In order to transform the information present in the Social Networks into knowledge several steps are required. This project aim to describe them and provide tools that are able to perform this task. The first problem is how to retrieve the data. Several ways are available, each one with its own pros and cons. After that it is necessary to study and define proper queries in order to retrieve the information needed. Once the data is retrieved you may need to filter and explore your data. For this task a Topic Model Algorithm ( LDA ) has been studied and analyzed. LDA has shown positive results when it is tuned in the proper way and it is combined with appropriate visualization techniques. The difference between a Topic Model Algorithm and other Clustering/Segmentation techniques is that Topic Models allows each ”document” ( instance ) to belong to more than one topic ( cluster ). LDA doesn’t natively work well on Twitter due to the very short length of the tweets. An investigation in the literature has revealed a solution to this problem. Another problem that is common in clustering is how to validate the Algorithm and how to choose the proper number of topics ( clusters), for this problem several metrics in the literature have been explored. Afterwards, Sentiment Analysis techniques can be applied in order to measure the opinion of the users . The literature presents several approaches and ways to solving this problem. This work is focused in solving the Polarity Detection task, with three classes , so, classify if a tweet express a positive , a negative or a neutral sentiment. Here reach accurate results can be challenging, due to the messy nature of the twitter posts. Several approaches have been tested and compared. The baseline method tested is the use of sentiment dictionaries, after that , since the real sentiment of the twitter posts is not available, a sample has been manually labeled and several Supervised approaches combined with various Feature Selection/Transformation techniques have been tested. Finally, a totally new experimental approach, inspired from the Soft Labeling technique present in the literature, has been defined and tested. This method try to avoid the costly task to manually label a sample in order to validate a model. In the literature this problem is solved for the two-class problem, so by considering only positive and negative tweets. This work try to extend the soft-labeling approach to the three class problem.
Recuperació de la informació, :Informàtica [Àrees temàtiques de la UPC], Topic Modeling, Information Retrieval, Twitter, Sentiment Analysis, Microblogs Analysis, Àrees temàtiques de la UPC::Informàtica
Recuperació de la informació, :Informàtica [Àrees temàtiques de la UPC], Topic Modeling, Information Retrieval, Twitter, Sentiment Analysis, Microblogs Analysis, Àrees temàtiques de la UPC::Informàtica
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 61 | |
| downloads | 313 |

Views provided by UsageCounts
Downloads provided by UsageCounts