Predicting Readability of Health Educational Resources for Children Using Semantic Features

  • Yanmeng Liu School of Languages and Cultures, the University of Sydney, Australia
Keywords: health education materials, children, readability, semantic, machine learning

Abstract

The success of health education resources largely depends on their readability, as the health information can only be understood and accepted by the target readers when the information is uttered with proper reading difficulty. Unlike other populations, children feature limited knowledge and underdeveloped reading comprehension, which poses more challenges for the readability research on health education resources. This research aims to explore the readability prediction of health education resources for children by using semantic features to develop machine learning algorithms. A data-driven method was applied in this research:1000 health education articles were collected from international health organization websites, and they were grouped into resources for kids and resources for non-kids according to their sources. Moreover, 73 semantic features were used to train five machine learning algorithms (decision tree, support vector machine, k-nearest neighbors algorithm, ensemble classifier, and logistic regression). The results showed that the k-nearest neighbors algorithm and ensemble classifier outperformed in terms of area under the operating characteristic curve sensitivity, specificity, and accuracy and achieved good performance in predicting whether the readability of health education resources is suitable for children or not.

References

Alotaibi, S., Alyahya, M., Al-Khalifa, H., Alageel, S., & Abanmy, N. (2016). Readability of Arabic Medicine Information Leaflets: A Machine Learning Approach. Procedia Computer Science, 82, 122-126. https://doi.org/10.1016/j.procs.2016.04.017
Alpaydin, E. (2020). Introduction to machine learning. MIT press.
Balyan, R., Crossley, S. A., Brown III, W., Karter, A. J., McNamara, D. S., Liu, J. Y., Lyles, C. R., & Schillinger, D. (2019). Using natural language processing and machine learning to classify health literacy from secure messages: The ECLIPPSE study. PloS one, 14(2), e0212488. https://doi.org/10.1371/journal.pone.0212488
Benjamin, R. G. (2012). Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review, 24(1), 63-88. https://doi.org/10.1007/s10648-011-9181-8
Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of applied psychology, 60(2), 283. https://doi.org/10.1037/h0076540
Collins-Thompson, K. (2014). Computational assessment of text readability: A survey of current and future research. ITL-International Journal of Applied Linguistics, 165(2), 97-135. https://doi.org/10.1075/itl.165.2.01col
D'Alessandro, D. M., Kingsley, P., & Johnson-West, J. (2001). The readability of pediatric patient education materials on the World Wide Web. Archives of pediatrics & adolescent medicine, 155(7), 807-812. https://doi.org/10.1001/archpedi.155.7.807
Feng, L., Jansche, M., Huenerfauth, M., & Elhadad, N. (2010). A comparison of features for automatic readability assessment. Paper presented at the 23rd International Conference on Computational Linguistics, Beijing, China.
Ferster, A. P. O. C., & Hu, A. (2017). Evaluating the Quality and Readability of Internet Information Sources regarding the Treatment of Swallowing Disorders. Ear, Nose & Throat Journal, 96(3), 128-138. https://doi.org/10.1177/014556131709600312
Field, A. (2009). Logistic regression [PowerPoint slides]. Retrieved from http://users.sussex.ac.uk/~andyf/logreglecture.pdf
Flesch, R. (1948). A new readability yardstick. Journal of applied psychology, 32(3), 221.
Friedman, D. B., & Hoffman-Goetz, L. (2006). A systematic review of readability and comprehension instruments used for print and web-based cancer information. Health Education & Behavior, 33(3), 352-373. https://doi.org/10.1177/1090198105277329
Gunning, R. (1952). Technique of clear writing. McGraw-Hill, New York.
Kong, K., & Hu, A. (2015). Readability Assessment of Online Tracheostomy Care Resources. Otolaryngology–Head and Neck Surgery, 152(2), 272-278. https://doi.org/10.1177/0194599814560338
Mc Laughlin, G. H. (1969). SMOG grading-a new readability formula. Journal of reading, 12(8), 639-646.
Meade, C. D., & Smith, C. F. (1991). Readability formulas: cautions and criteria. Patient Education and Counseling, 17(2), 153-158. https://doi.org/10.1016/0738-3991(91)90017-Y
Mumford, M. E. (1997). A descriptive study of the readability of patient information leaflets designed by nurses. Journal of Advanced Nursing, 26(5), 985-991. https://doi.org/10.1046/j.1365-2648.1997.00455.x
Narkhede, S. (2018). Understanding auc-roc curve. Towards Data Science, 26, 220-227.
Nation, K. (2005). Children's Reading Comprehension Difficulties. https://doi.org/10.1177/0963721411408673
O'Hayre, J., & Management, U. S. B. o. L. (1966). Gobbledygook Has Gotta Go. U.S. Department of the Interior, Bureau of Land Management. Retrieved from https://books.google.com.tw/books?id=1yTeNg9bxGUC
Rayson, P., Archer, D., Piao, S., & McEnery, A. M. (2004). The UCREL semantic analysis system. In Proceedings of the beyond named entity recognition semantic labelling for NLP tasks workshop, Lisbon, Portugal.
Senter, R., & Smith, E. A. (1967). Automated readability index [Technical Report]. Retrieved from https://apps.dtic.mil/sti/citations/AD0667273
Shoemaker, S. J., Wolf, M. S., & Brach, C. (2014). Development of the Patient Education Materials Assessment Tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Education and Counseling, 96(3), 395-403. https://doi.org/10.1016/j.pec.2014.05.027
Si, L., & Callan, J. (2001). A statistical model for scientific readability. In Proceedings of the tenth international conference on Information and knowledge management, Atlanta, U.S.A.
Taylor, H., & Bramley, D. (2012). An analysis of the readability of patient information and consent forms used in research studies in anaesthesia in Australia and New Zealand. Anaesthesia and intensive care, 40(6), 995-998. https://doi.org/10.1177/0310057X1204000610
World Health Organization. (2017). WHO Strategic Communications Framework for effective communications [Report]. Retrieved from https://www.who.int/mediacentre/communication-framework.pdf
Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295-316. https://doi.org/10.1016/j.neucom.2020.07.061
Yi, G. S., & Hu, A. (2020). Quality and Readability of Online Information on In-Office Vocal Fold Injections. Annals of Otology, Rhinology & Laryngology, 129(3), 294-300. https://doi.org/10.1177/0003489419887406
Zheng, J., & Yu, H. (2018). Assessing the readability of medical documents: A ranking approach. JMIR medical informatics, 6(1), e17. https://doi.org/10.2196/medinform.8611
Hyperparameter tunning (ENS)
Published
2021-04-23
Section
Articles