
arXiv: 1902.00562
AbstractSuccessfully predicting gentrification could have many social and commercial applications; however, real estate sales are difficult to predict because they belong to a chaotic system comprised of intrinsic and extrinsic characteristics, perceived value, and market speculation. Using New York City real estate as our subject, we combine modern techniques of data science and machine learning with traditional spatial analysis to create robust real estate prediction models for both classification and regression tasks. We compare several cutting edge machine learning algorithms across spatial, semispatial, and nonspatial feature engineering techniques, and we empirically show that spatially conscious machine learning models outperform nonspatial models when married with advanced prediction techniques such as Random Forests, generalized linear models, gradient boosting machines, and artificial neural networks.
random forests, FOS: Computer and information sciences, Computer Science - Machine Learning, spatial analysis, real estate, Statistics, Machine Learning (stat.ML), generalized linear models, Computer science, gradient boosting, Machine Learning (cs.LG), feature engineering, machine learning, Statistics - Machine Learning, artificial neural networks
random forests, FOS: Computer and information sciences, Computer Science - Machine Learning, spatial analysis, real estate, Statistics, Machine Learning (stat.ML), generalized linear models, Computer science, gradient boosting, Machine Learning (cs.LG), feature engineering, machine learning, Statistics - Machine Learning, artificial neural networks
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 18 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
