Powered by OpenAIRE graph
Found an issue? Give us feedback
VTechWorksarrow_drop_down
VTechWorks
Presentation . 2021
License: CC 0
Data sources: VTechWorks
addClaim

YouTube Video Analysis

Authors: Bachubhay, Akhil; Chhour, Danny; Deng, Heji; Tran, Trung;

YouTube Video Analysis

Abstract

YouTube (youtube.com) is an online video-sharing platform that allows users to upload, view, rate, share, add to playlists, report, comment on videos, and subscribe to other users. Over 2 billion logged-in users visit YouTube each month, and every day people watch over a billion hours of video and generate billions of views. UGC (User-Generated Content) makes up a good portion of the content available on YouTube, and more and more people post videos on YouTube, many of which become well-known YouTubers. A notable trend to look at for these YouTubers is how their channel grows over time. We were tasked with analyzing how certain YouTubers become successful over time, how their early videos differ from later ones in terms of scripts, and how comments change with fame. Such analysis requires us to look into two sets of data. The first set is numerical data of the channels, which consists of view counts of videos, likes and dislikes on videos, published dates of the video, the interactions between the video creator and the audience, etc. The second set is textual data, which consists of the auto-generated scripts from videos as well as comments from the users. With the help of YouTube APIs and other available helper tools, we are able to scrape the metadata from data of videos and output them as CSV files for future studies. For the analysis, we generate some scatter graphs where each dot stands for one instance of the video, where the x-axis represents the published date while the y-axis represents the views it gets, and then the color of the dot represents some other metrics for evaluation (for instance, the duration of videos). With the Python NLTK package, we are able to conduct analyses over the transcripts from the videos and comments, to see what words are spoken the most, what words appear frequently in the comments and if they are positive or negative, how many words the creator says in a minute, etc. Combining these data we can generate a more thorough scatter graph for discovering if there is a pattern on how certain YouTubers become more and more successful.

This project was developed using data solely from one channel called Biffa Plays Indie Games as the basis, but it is expected to function correctly when used on other channels as well. The two versions of the final report are in YouTubeVideoAnalysisReport.docx (Word) and YouTubeVideoAnalysisReport.pdf (PDF). The two versions of the final presentation are in YouTubeVideoAnalysisPresentation.pptx (PowerPoint) and YouTubeVideoAnalysisPresentation.pdf (PDF).

Country
United States
Related Organizations
Keywords

Data Analysis, Web Scraper, YouTube, Data Collection, Plotly, NLTK, Frequency Count, Transcripts, Social Media, Comments, Python, Jupyter Notebook

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!