Use of search engine optimization factors for Google page rank prediction
Computer and Information Science
Over the years, search engines have become an important tool for finding information. It is known that users select the link on the first page of search results in 62% of the cases. Search engine optimization techniques enable website improvement and therefore a better ranking in search engines. The exact specification of the factors that affect website ranking is not disclosed by search engine owners. In this thesis we tried to choose some most frequently mentioned search engine optimization factors for Google search engine. Using the factors we tried to apply machine learning methods to build a model that predicts whether a site would be ranked among the top 10 search results (i.e. the first page of search engine results). The best results were achieved using a classification method called random forests, but the obtained AUC was below acceptable AUC estimates for such problems. We also tried to find statistically significant informative features. Only a few features matched the criteria, but had a very low information content. To achieve better results other features could be used and the number of training examples could be increased.