
Objective: This study aimed to build a predictive model for Uber ridership in New York City.Data: Uber rides data is downloaded from GitHub. The data were collected during January 2015 to June 2015 and had ridership information for all the five boroughs of New York City. We used a random sample of 50% whole data to build a predictive model, and used the other 50% to validate the model and further used bootstrap data for model validation. Mean squared errors (MSE) were calculated. The predicted riders and the observed riders were compared to measure the performance of the predictive model. All the analysis was done using free statistical software R.Results: A total of 20490 observations including hourly ridership information were extracted for this study. A total of 10245 observations were selected in training sample for model building, and 10245 in test sample, and 500000 in bootstrap sample which was based on test sample were used for model validation. A total of 7171685 riders were in training sample, 7092530 in test sample, and 345936605 in bootstrap sample. The predicted risers were 7171685 riders in training sample, 7134560 in test sample, and 348357233 in bootstrap sample. The predictive model performed well in the split sample which was not used for model building and bootstrap sample. The MSE was 142,866 in training sample, and 141,142 in test sample and 141480 in bootstrap sample. The observed ridership and predicted ridership were close to each other in each month of Jan-June, in each hour, in each week day, and in each district in New York city.Conclusions: A predictive model was built and validated using public available data for Uber ridership in New York city. It could be used to predict the business opportunities for Uber and help to make an informed decision regarding resource allocation.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
