In this thesis, various machine learning domains have been combined in order to build a video recommender system that is based on object detection. The work combines two extensively studied research fields, recommender systems and computer vision, that also are rapidly growing and popular techniques on commercial markets. To investigate the performance of the approach, three different content-based recommender systems have been implemented at Spotify, which are based on the following video features: object detections, titles and descriptions, and user preferences. These systems have then been evaluated and compared against each other together with their hybridized result. Two algorithms have been implemented, the prediction and the top-N algorithm, where the former is the more reliable source for evaluating the system's performance. The evaluation of the system shows that the overall performance scores for predicting values of the users' liked and disliked videos are in the range from about 40 % to 70 % for the prediction algorithm and from about 15 % to 70 % for the top-N algorithm. The approach based on object detection performs worse in comparison to the other approaches. Hence, there seems to be is a low correlation between the user preferences and the video contents in terms of object detection data. Therefore, this data is not very suitable for describing the content of videos and using it in the recommender system. However, the results of this study cannot be generalized to apply for other systems before the approach has been evaluated in other environments and for various data sets. Moreover, there are plenty of room for refinements and improvements to the system, as well as there are many interesting research areas for future work.