Gradient Boosting Decision Tree for House Price Prediction with Google Trends

  • Faye F.F. Jiang Scholar, Hong Kong
Keywords: house price prediction, growth rate of house price index, Google Trends, gradient boosting decision tree, recursive feature elimination

Abstract

Predicting house price accurately can reflect the popularity of the housing market and help making decisions for investors and policymakers. Statistics of macro factors are commonly used for house price forecasting; however, macro factors obtained from government reports have defect of time lag and may impair the prediction performance. Google Trends data can serve as a leading sentiment indicator of people’s attitudes and expectations toward the housing market and help improve house price prediction. Therefore, this study proposes a new methodology framework for house price prediction with Google Trends data. Recursive Feature Elimination (RFE), a feature selection method, is utilized to remove noisy data and improve feature quality. Gradient Boosting Decision Tree (GBDT) is adopted to establish models for house price forecasting. Real estate-related Google Trends data, along with the fundamental house price index (HPI) data are collected to predict the growth rate of HPI in the United States. Results show that RFE can effectively remove irrelevant features and improve the model performance. GBDT has higher and more stable prediction accuracy than other prediction models, especially when the predicted time span is long. Compared with models including fundamental HPI data only, models containing Google Trends data can exhibit higher and more stable prediction accuracy for long time span forecasting. Three categories of Google Trends indices, including “house rent”, “housing market & real estate market”, and “mortgage & real estate agency” are found to be the most important indicators of the variation of HPI growth rate.

References

[1] Jiang, F., Ma, J., Webster, C. J., Chen, W., & Wang, W. (2024). Estimating and explaining regional land value distribution using attention-enhanced deep generative models. Computers in Industry, 159–160, Article 104103. https://doi.org/10.1016/j.compind.2024.104103
[2] Campbell, J. Y., & Cocco, J. F. (2007). How do house prices affect consumption? Evidence from micro data. Journal of Monetary Economics, 54(3), 591–621. https://doi.org/10.1016/j.jmoneco.2005.10.016
[3] Boelhouwer, P., Haffner, M., Neuteboom, P., & Vries, P. (2004). House prices and income tax in the Netherlands: An international perspective. Housing Studies, 19(3), 415–432. https://doi.org/10.1080/0267303042000204304
[4] Aoki, K., Proudman, J., & Vlieghe, G. (2004). House prices, consumption, and monetary policy: A financial accelerator approach. Journal of Financial Intermediation, 13(4), 414–435. https://doi.org/10.1016/j.jfi.2004.06.003
[5] Jiang, F., Ma, J., Webster, C. J., Wang, W., & Cheng, J. C. P. (2024). Automated site planning using CAIN-GAN model. Automation in Construction, 159, Article 105286. https://doi.org/10.1016/j.autcon.2024.105286
[6] Jiang, F., Ma, J., Webster, C. J., Li, X., & Gan, V. J. L. (2023). Building layout generation using site-embedded GAN model. Automation in Construction, 151, Article 104888. https://doi.org/10.1016/j.autcon.2023.104888
[7] Kouwenberg, R. R. P., & Zwinkels, R. C. J. (2011). Chasing trends in the U.S. housing market. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.1539475
[8] Tsatsaronis, K., & Zhu, H. (2004). What drives housing price dynamics: Cross-country evidence. Social Science Research Network. Retrieved from https://papers.ssrn.com/abstract=1968425
[9] Deng, Y., Gyourko, J., & Wu, J. (2012). Land and house price measurement in China. National Bureau of Economic Research. https://doi.org/10.3386/w18403
[10] Bork, L., & Møller, S. V. (2015). Forecasting house prices in the 50 states using Dynamic Model Averaging and Dynamic Model Selection. International Journal of Forecasting, 31(1), 63–78. https://doi.org/10.1016/j.ijforecast.2014.05.005
[11] Jiang, F., & Ma, J. (2025). Environmental justice in the 15-minute city: Assessing air pollution exposure inequalities through machine learning and spatial network analysis. Smart Cities, 8, 53. https://doi.org/10.3390/smartcities8020053
[12] Zhou, J., Li, Z., Ma, J. J., & Jiang, F. (2020). Exploration of the hidden influential factors on crime activities: A big data approach. IEEE Access, 8, 141033–141045. https://doi.org/10.1109/ACCESS.2020.3009969
[13] Varma, A., Sarma, A., Doshi, S., & Nair, R. (2018). House price prediction using machine learning and neural networks. In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT) (pp. 1936–1939). IEEE. https://doi.org/10.1109/ICICCT.2018.8473231
[14] Wei, Y., & Cao, Y. (2017). Forecasting house prices using dynamic model averaging approach: Evidence from China. Economic Modelling, 61, 147–155. https://doi.org/10.1016/j.econmod.2016.12.002
[15] Schäfers, W., Braun, N., & Dietzel, M. A. (2014). Sentiment-based commercial real estate forecasting with Google search volume data. Journal of Property Investment & Finance, 32(6), 540–569. https://doi.org/10.1108/JPIF-01-2014-0004
[16] Choi, H., & Varian, H. (2012). Predicting the present with Google Trends. Economic Record, 88(s1), 2–9. https://doi.org/10.1111/j.1475-4932.2012.00809.x
[17] Jiang, F., Yuen, K. K. R., & Lee, E. W. M. (2020). Analysis of motorcycle accidents using association rule mining-based framework with parameter optimization and GIS technology. Journal of Safety Research, 75, 292–309. https://doi.org/10.1016/j.jsr.2020.09.004
[18] Jiang, F., Yuen, K. K. R., Lee, E. W. M., & Ma, J. (2020). Analysis of run-off-road accidents by association rule mining and geographic information system techniques on imbalanced datasets. Sustainability, 12(12), Article 4882. https://doi.org/10.3390/su12124882
[19] Li, Z., Ma, J., & Jiang, F. (2024). Exploring the effects of 2D/3D building factors on urban energy consumption using explainable machine learning. Journal of Building Engineering, 97, Article 110827. https://doi.org/10.1016/j.jobe.2024.110827
[20] Jiang, F., Ma, J., Webster, C. J., Chiaradia, A. J. F., Zhou, Y., Zhao, Z., & Zhang, X. (2024). Generative urban design: A systematic review on problem formulation, design generation, and decision-making. Progress in Planning, 180, Article 100795. https://doi.org/10.1016/j.progress.2023.100795
[21] Breiman, L. (2017). Classification and regression trees. Routledge. https://doi.org/10.1201/9781315139470
[22] Si, S., Zhang, H., Keerthi, S. S., Mahajan, D., Dhillon, I. S., & Hsieh, C.-J. (n.d.). Gradient boosted decision trees for high dimensional sparse output.
[23] Jiang, F., Ma, J., & Li, Z. (2022). Pedestrian volume prediction with high spatiotemporal granularity in urban areas by the enhanced learning model. Sustainable Cities and Society, 79, Article 103653. https://doi.org/10.1016/j.scs.2021.103653
[24] Mohan, A., Chen, Z., & Weinberger, K. (n.d.). Web-search ranking with initialized gradient boosted regression trees.
[25] Granitto, P. M., Furlanello, C., Biasioli, F., & Gasperi, F. (2006). Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2), 83–90. https://doi.org/10.1016/j.chemolab.2006.01.007
[26] Jiang, F., & Ma, J. (2021). A comprehensive study of macro factors related to traffic fatality rates by XGBoost-based model and GIS techniques. Accident Analysis & Prevention, 163, Article 106431. https://doi.org/10.1016/j.aap.2021.106431
[27] Jiang, F., Yuen, K. K. R., Lee, E. W. M., & Ma, J. (2020). A long short-term memory-based framework for crash detection on freeways with traffic data of different temporal resolutions. Accident Analysis & Prevention, 141, Article 105520. https://doi.org/10.1016/j.aap.2020.105520
[28] Freddie Mac Home. (n.d.). http://www.freddiemac.com//index.html
[29] Google Trends. (2019). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Google_Trends &oldid=883779596
[30] Time series analysis. (n.d.). Princeton University Press. https://press.princeton.edu/titles/5386.html
[31] Jiang, F., & Ma, J. (2025). Predicting urban vitality at regional scales: A deep learning approach to modelling population density and pedestrian flows. Smart Cities, 8, 58. https://doi.org/10.3390/smartcities8020058
[32] Jiang, F., Ma, J., Li, Z., & Ding, Y. (2022). Prediction of energy use intensity of urban buildings using the semi-supervised deep learning model. Energy, 249, Article 123631. https://doi.org/10.1016/j.energy.2022.123631
[33] Kajuth, F. (n.d.). Seasonality in house prices.
[34] Green, R., & Hendershott, P. H. (1996). Age, housing demand, and real house prices. Regional Science and Urban Economics, 26(5), 465–480. https://doi.org/10.1016/0166-0462(96)02128-X
[35] Federal funds rate. (2019). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Federal_funds_rate &oldid=881182406
[36] Dietzel, M. A. (2016). Sentiment-based predictions of housing market turning points with Google trends. International Journal of Housing Markets and Analysis, 9(1), 108–136. https://doi.org/10.1108/IJHMA-12-2014-0058
Methodology framework
Published
2025-04-05
Section
Articles