Using Machine Learning to Predict Student Performance
student performance | machine learning | regression | naïve Bayes classification | decision trees | Tietojenkäsittelytieteiden tutkinto-ohjelma - Degree Programme in Computer Sciences
This thesis examines the application of machine learning algorithms to predict whether a student will be successful or not. The specific focus of the thesis is the comparison of machine learning methods and feature engineering techniques in terms of how much they improve the prediction performance.
Three different machine learning methods were used in this thesis. They are linear regression, decision trees, and naïve Bayes classification. Feature engineering, the process of modification and selection of the features of a data set, was used to improve predictions made by these learning algorithms.
Two different data sets containing records of student information were used. The machine learning methods were applied to both the raw version and the feature engineered version of the data sets, to predict the student's success.
The thesis comes to the same conclusion as the earlier studies: The results show that it is possible to predict student performance successfully by using machine learning. The best algorithm was naïve Bayes classification for the first data set, with 98 percent accuracy, and decision trees for the second data set, with 78 percent accuracy. Feature engineering was found to be more important factor in prediction performance than method selection in the data used in this study.
views in local repository
downloads in local repository