Investigation on Tourism Trends Using K-means Clustering and Regression Analysis

  • Anna Sheila Ilumin Crisostomo Department of Tourism and Management, Oman Tourism College, Muscat, Sultanate of Oman
  • Badar Al Dhuhli Department of Tourism and Management, Oman Tourism College, Muscat, Sultanate of Oman
  • Reggie C. Gustilo Department of Electronics and Computer Engineering, De La Salle University, Manila, Philippines
Keywords: tourism trends, k-means clustering, regression analysis, forecasting

Abstract

Purpose: The purpose of this study was to analyze tourism trends by determining the clusters of tourists based on common factors. Three (3) characteristics were explored using k-means clustering namely tourists’ demographics, travel patterns and travel preferences. These clusters were based on individual’s age, gender, country of origin, frequency of travel, travel destinations and seasons. Regression analysis was also performed to determine the factors that influence the length of stay of tourists in their travel destinations.

Methodology: This research conducted a survey from 150 respondents of different age groups, gender, and nationalities. Frequency of travel in a year, length of stay per travel, seasons, destinations, purpose of travel and preferred booking method were the parameters inquired in the survey. The collected dataset was utilized to characterize the clusters of tourists with common considerations. Additionally, regression analysis was used to forecast predictors influencing tourists’ length of stay.

Findings: Three (3) parameters were considered in performing k-means clustering such as tourists’ demographic profiles, travel patterns and preferences. Regression analysis likewise was employed to predict visitors’ length of stay using age, gender, purpose of travel, travel season, and preferred destination as independent variables. In participants’ demographics, number of clusters generated was k=5. Gender and nationalities were found to be randomly clustered while other parameters were categorized according to various age groups and frequency of travel. Consequently, for tourists’ travel patterns, age, gender, country of origin, frequency of stay, purpose of travel, length of stay and travel seasons were used as parameters. The elbow method knee-point revealed (k=6) as the optimal number of clusters. Moreover, travel preferences parameter was also considered for clustering where predictors like gender, age, country of origin, frequency of travel, purpose of travel, travel season and length of stay were utilized. The optimal number of clusters for this category generated K=5. Regression analysis revealed gender, age and purpose of travel as significant factors influencing tourists’ average length of stay. The combination of these variables generated the lowest value of MSE=0.64.

Research limitations/implications: A limited dataset of 150 respondents mainly from Asia and Middle East were utilized in performing preliminary initiatives in analyzing tourism trends. The predictors used in the analysis were restricted to gender, age, country of origin, travel frequency, length of stay, travel season and travel destinations. Supplementary parameters ca be considered in a big data setting for similar studies in the future. K-means clustering was selected among other algorithms with attributes commonality while regression analysis was employed to determine the factors influencing tourists’ length of stay in their destinations.

Social Implications: Results of this study will greatly support individual tourists in determining trends in various travel destinations. Similarly, business owners gain benefit forecasting travellers’ requirements such as accommodation, food, services, etc. Research findings likewise provide informed decisions for stakeholders

Originality / Value: The dataset used were participants from different countries and nationalities which include Philippines, Saudi Arabia, United Arab Emirates, Oman, USA, Portugal, Germany, Malaysia, Thailand, Qatar, Finland, Denmark, Spain Taiwan, South Korea, Singapore, Australia, Austria, England, UK, India and China. The presented codes were programmed in python where analyses and interpretations were based on formulated objectives. K-means clustering, and regression analysis were both employed to present varied clusters according to tourists’ demographic profiles, travel patterns and preferences. Different factors were identified and used to predict tourists’ length of stay in their preferred destinations.

References

Aguilar, M. I., & Diaz, B. (2019). Length of stay of international tourists in Spain: A parametric survival analysis. Annals of Tourism Research, 79, 102768. https://doi.org/10.1016/j.annals.2019.102768
Bartl, E., Weigert, M., Bauer, A., Schmude, J., Karl, M., & Küchenhoff, H. (2025). Understanding travel behaviour patterns and their dynamics: Applying fuzzy clustering and age-period-cohort analysis on long-term data of German travellers. European Journal of Tourism Research, 39, 3914. https://doi.org/10.54055/ejtr.v39i.3862
Brida, J. G., & Scuderi, R. (2013). Determinants of tourist expenditure: A review of microeconometric models. Tourism Management Perspectives, 6, 28–40. https://dx.doi.org/10.2139/ssrn.2048221
Feng, L., Mu, L., Yang, Y., & Li, C. (2024). Research on tourist destination passenger flow prediction based on decomposition clustering technology for data continuity problem. In 2024 International Conference on Culture-Oriented Science & Technology (CoST) (pp. 60–65). IEEE. https://doi.org/10.1109/CoST64302.2024.00021
Hasanah, H., Sudibyo, N. A., & Galih, R. M. (2021). Data mining using K-means clustering algorithm for grouping countries of origin of foreign tourist. Nusantara Science and Technology Proceedings, 88–94. https://doi.org/10.11594/nstp.2021.1112
Jun, W., Yuyan, L., Lingyu, T., & Peng, G. (2018). Modeling a combined forecast algorithm based on sequence patterns and near characteristics: An application for tourism demand forecasting. Chaos, Solitons & Fractals, 108, 136–147. https://doi.org/10.1016/j.glas.2018.01.028
Kaur, S., & Kaur, M. (2019). Behavioral intentions of heritage tourists: Influential variables on recommendations to visit. Journal of Heritage Tourism, 15(5), 511–532. https://doi.org/10.1080/1743873X.2019.1692852
Li, J., Weng, J., Shao, C., & Guo, H. (2016). Cluster-based logistic regression model for holiday travel mode choice. Procedia Engineering, 137, 729–737. https://doi.org/10.1016/j.proeng.2016.01.310
Li, K., Liang, C., Lu, W., Li, C., Zhao, S., & Wang, B. (2020). Forecasting of short-term daily tourist flow based on seasonal clustering method and PSO-LSSVM. ISPRS International Journal of Geo-Information, 9(11), 676. https://doi.org/10.3390/ijgi9110676
Monica, S., Natalia, F., & Sudirman, S. (2018). Clustering tourism object in Bali Province using K-means and X-means clustering algorithm. In 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (pp. 1462–1467). IEEE. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00241
Peng, B., Song, H., Crouch, G. I., & Witt, S. F. (2015). A meta-analysis of international tourism demand elasticities. Journal of Travel Research, 54(5), 611–633. https://doi.org/10.1177/0047287514528283
Ruiz Reina, M. Á. (2021). Tourism and Big Data: Forecasting with hierarchical and sequential cluster analysis. Engineering Proceedings, 5(1), 14. https://doi.org/10.3390/engproc2021005014
Wang, S., Li, J., & Zhao, X. (2018). Exploring tourist behaviour with K-means clustering: Insights from geotagged social media. Journal of Tourism Management, 70, 1–15.
Yildirim, M. E., Kaya, M., & FurkanInce, I. (2022). A case study: Unsupervised approach for tourist profile analysis by K-means clustering in Turkey. Journal of Internet Computing and Services, 23(1), 11–17. https://doi.org/10.7472/JKSII.2022.23.1.11
Zhou, X., & Chen, Z. (2021). Destination attraction clustering: Segmenting tourist movement patterns with geotagged information. Tourism Geographies, 25(2–3), 797–819. https://doi.org/10.1080/14616688.2021.2006769
Published
2025-05-08
Section
Articles