An Enhanced Framework for Urban Water Consumption Analysis: Feature Clustering with Ensemble Methods

  • Faye F.F. Jiang Scholar, Hong Kong
Keywords: urban water, feature clustering, LightGBM, machine learning, urban planning

Abstract

Urban water consumption analysis presents significant challenges due to the complex interplay of socioeconomic, demographic, and built environment factors. This paper introduces a novel Feature Clustering Framework of TopK and Threshold with Ensemble Method (FCTTE) specifically designed to address high-dimensional urban datasets. We evaluate this framework using a comprehensive dataset of 1,120 features across eight domains related to New York City's urban environment. Our experiments demonstrate that FCTTE significantly outperforms conventional feature selection methods, improving LightGBM classification accuracy by 4.6% compared to baseline, while traditional methods achieved only 1% improvement. The framework identified median family income, energy usage intensity, adult male population, greenhouse gas emissions, and commercial building characteristics as the most influential factors affecting water consumption. By effectively managing feature redundancy through hierarchical clustering and strategic selection, FCTTE provides urban planners with interpretable insights for water resource management while maintaining superior predictive performance. This integrated approach bridges the gap between fragmented analyses of individual urban factors and the need for holistic understanding of water consumption patterns in complex urban environments.

References

[1] Ma, J., Ding, Y., Cheng, J. C., Jiang, F., & Xu, Z. (2020). Soft detection of 5-day BOD with sparse matrix in city harbor water using deep learning techniques. Water Research, 170, Article 115350. https://doi.org/10.1016/j.watres.2019.115350
[2] Saurí, D. (2013). Water conservation: Theory and evidence in urban areas of the developed world. Annual Review of Environment and Resources, 38, 227–248. https://doi.org/10.1146/annurev-environ-013113-142651
[3] Jiang, F., Ma, J., Li, Z., & Ding, Y. (2022). Prediction of energy use intensity of urban buildings using the semi-supervised deep learning model. Energy, 249, Article 123631. https://doi.org/10.1016/j.energy.2022.123631
[4] Jiang, F., & Ma, J. (2025). Predicting urban vitality at regional scales: A deep learning approach to modelling population density and pedestrian flows. Smart Cities, 8, 58. https://doi.org/10.3390/smartcities8020058
[5] Jiang, F., Ma, J., & Li, Z. (2022). Pedestrian volume prediction with high spatiotemporal granularity in urban areas by the enhanced learning model. Sustainable Cities and Society, 79, Article 103653. https://doi.org/10.1016/j.scs.2021.103653
[6] Hussien, W. A., Memon, F. A., & Savic, D. A. (2016). Assessing and modelling the influence of household characteristics on per capita water consumption. Water Resources Management, 30(9), 2931–2955. https://doi.org/10.1007/s11269-016-1314-x
[7] Katz, D. (2015). Water use and economic growth: Reconsidering the Environmental Kuznets Curve relationship. Journal of Cleaner Production, 88, 205–213. https://doi.org/10.1016/j.jclepro.2014.08.017
[8] Sant’Ana, D., & Mazzega, P. (2018). Socioeconomic analysis of domestic water end-use consumption in the Federal District, Brazil. Sustainable Water Resources Management, 4(4), 921–936. https://doi.org/10.1007/s40899-017-0186-4
[9] Blasco, X., Martínez, M., Herrero, J. M., Ramos, C., & Sanchis, J. (2007). Model-based predictive control of greenhouse climate for reducing energy and water consumption. Computers and Electronics in Agriculture, 55(1), 49–70. https://doi.org/10.1016/j.compag.2006.12.001
[10] Ma, J., & Cheng, J. C. (2016). Identifying the influential features on the regional energy use intensity of residential buildings based on Random Forests. Applied Energy, 183, 193–201.
[11] Ma, J., Cheng, J. C., Jiang, F., Chen, W., & Zhang, J. (2020). Analyzing driving factors of land values in urban scale based on big data and non-linear machine learning techniques. Land Use Policy, 94, Article 104537.
[12] Cheng, J. C., & Ma, L. J. (2015). A non-linear case-based reasoning approach for retrieval of similar cases and selection of target credits in LEED projects. Building and Environment, 93, 349–361.
[13] Jiang, F., Ma, J., Webster, C. J., Chiaradia, A. J. F., Zhou, Y., Zhao, Z., et al. (2024). Generative urban design: A systematic review on problem formulation, design generation, and decision-making. Progress in Planning, 180, Article 100795. https://doi.org/10.1016/j.progress.2023.100795
[14] Li, Z., Ma, J., & Jiang, F. (2024). Exploring the effects of 2D/3D building factors on urban energy consumption using explainable machine learning. Journal of Building Engineering, 97, Article 110827. https://doi.org/10.1016/j.jobe.2024.110827
[15] Zhou, J., Li, Z., Ma, J. J., & Jiang, F. (2020). Exploration of the hidden influential factors on crime activities: A big data approach. IEEE Access, 8, 141033–141045. https://doi.org/10.1109/ACCESS.2020.3009969
[16] Li, Z., & Ma, J. (2022). Discussing street tree planning based on pedestrian volume using machine learning and computer vision. Building and Environment, 219, Article 109178.
[17] Jiang, F., Ma, J., Webster, C. J., Li, X., & Gan, V. J. (2023). Building layout generation using site-embedded GAN model. Automation in Construction, 151, Article 104888.
[18] Jiang, F., & Ma, J. (2021). A comprehensive study of macro factors related to traffic fatality rates by XGBoost-based model and GIS techniques. Accident Analysis & Prevention, 163, Article 106431. https://doi.org/10.1016/j.aap.2021.106431
[19] Jiang, F., Yuen, K. K. R., & Lee, E. W. M. (2020). A long short-term memory-based framework for crash detection on freeways with traffic data of different temporal resolutions. Accident Analysis & Prevention, 141, Article 105520. https://doi.org/10.1016/j.aap.2020.105520
[20] Jiang, F., Ma, J., & Li, Z. (2022). Pedestrian volume prediction with high spatiotemporal granularity in urban areas by the enhanced learning model. Sustainable Cities and Society, 79, Article 103653.
[21] Jiang, F., Ma, J., Webster, C. J., Chen, W., & Wang, W. (2024). Estimating and explaining regional land value distribution using attention-enhanced deep generative models. Computers in Industry, 159–160, Article 104103. https://doi.org/10.1016/j.compind.2024.104103
[22] Jiang, F., & Ma, J. (2025). Environmental justice in the 15-minute city: Assessing air pollution exposure inequalities through machine learning and spatial network analysis. Smart Cities, 8, 53. https://doi.org/10.3390/smartcities8020053
[23] Jiang, F., Ma, J., Webster, C. J., Wang, W., & Cheng, J. C. P. (2024). Automated site planning using CAIN-GAN model. Automation in Construction, 159, Article 105286. https://doi.org/10.1016/j.autcon.2024.105286
[24] Jiang, F., Ma, J., Webster, C. J., Li, X., & Gan, V. J. L. (2023). Building layout generation using site-embedded GAN model. Automation in Construction, 151, Article 104888. https://doi.org/10.1016/j.autcon.2023.104888
[25] Dietterich, T. G. (2000). Ensemble methods in machine learning. In Multiple Classifier Systems (pp. 1–15). Springer. https://doi.org/10.1007/3-540-45014-9_1
[26] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems: Vol. 30. Curran Associates, Inc.
[27] Gan, V. J., Lo, I. M., Ma, J., Tse, K., Cheng, J. C., & Chan, C. (2020). Simulation optimisation towards energy efficient green buildings: Current status and future trends. Journal of Cleaner Production, 254, Article 120012.
[28] Jiang, F., Ma, J., Li, Z., & Ding, Y. (2022). Prediction of energy use intensity of urban buildings using the semi-supervised deep learning model. Energy, 249, Article 123631.
[29] Jiang, F., Yuen, K. K. R., & Lee, E. W. M. (2020). Analysis of motorcycle accidents using association rule mining-based framework with parameter optimization and GIS technology. Journal of Safety Research, 75, 292–309. https://doi.org/10.1016/j.jsr.2020.09.004
[30] Jiang, F., Yuen, K. K. R., Lee, E. W. M., & Ma, J. (2020). Analysis of run-off-road accidents by association rule mining and geographic information system techniques on imbalanced datasets. Sustainability, 12, Article 4882. https://doi.org/10.3390/su12124882
[31] Lee, M., Keller, A. A., Chiang, P.-C., Den, W., Wang, H., Hou, C.-H., et al. (2017). Water-energy nexus for urban water systems: A comparative review on energy intensity and environmental impacts in relation to global water risks. Applied Energy, 205, 589–601. https://doi.org/10.1016/j.apenergy.2017.08.002
[32] Jiang, F., Ma, J., Webster, C. J., Chen, W., & Wang, W. (2024). Estimating and explaining regional land value distribution using attention-enhanced deep generative models. Computers in Industry, 159, Article 104103.
Data Integration Process
Published
2025-04-06
How to Cite
Jiang, F. F. (2025, April 6). An Enhanced Framework for Urban Water Consumption Analysis: Feature Clustering with Ensemble Methods. International Journal of Applied Science, 8(2), p22. https://doi.org/https://doi.org/10.30560/ijas.v8n2p22
Section
Articles