Improving Coronary Heart Disease Prediction Using Random Forest with a Modified Minority Synthetic Oversampling Technique on an Imbalanced Dataset

International Journal of Electrical and Electronics Engineering |
© 2025 by SSRG - IJEEE Journal |
Volume 12 Issue 3 |
Year of Publication : 2025 |
Authors : M. Janaki Ramudu, K. Narasimha Raju, A. Krishna Mohan |
How to Cite?
M. Janaki Ramudu, K. Narasimha Raju, A. Krishna Mohan, "Improving Coronary Heart Disease Prediction Using Random Forest with a Modified Minority Synthetic Oversampling Technique on an Imbalanced Dataset," SSRG International Journal of Electrical and Electronics Engineering, vol. 12, no. 3, pp. 100-113, 2025. Crossref, https://doi.org/10.14445/23488379/IJEEE-V12I3P111
Abstract:
Coronary Heart Diseases (CHDs) are the leading cause of death, with a fatal rate increasing every year. Around 80 million females and 110 million males are afflicted by this illness across the globe. Early detection and accurate risk assessment of this disease remain crucial in medical research. Many researchers are working on this issue, but it remains challenging. The proposed technique predicts CHD by applying the Modified Minority Synthetic Over-Sampling Technique (MMSOT) to balance the data and classify the data using the Random Forest (RF) and grid search techniques to fine-tune the hyperparameters. The proposed technique achieved decent performance on the Comprehensive Heart Disease Dataset, with an accuracy of 94.84%, ROC-AUC of 98.15%, Sensitivity of 95.00%, Specificity of 94.70%, F1-Score of 94.61%, Precision (PPV) of 94.21%, and NPV of 95.42%, outperforming baseline models.
Keywords:
Coronary Heart Disease, Grid Search, Machine Learning Techniques, MMSOT, SMOTE.
References:
[1] Roth, G. A. et al. “Global Burden of Cardiovascular Diseases and Risk Factors, 1990-2019: Update from the GBD 2019 Study,” Journal of the American College of Cardiology, vol. 76, no. 25, pp. 2982-3021, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Cardiovascular Diseases, World Health Organization, 2019. [Online]. Available: https://www.who.int/health-topics/cardiovascular diseases#tab=tab_1
[3] National Center for Health Statistics, Mortality Data on CDC WONDER, CDC WONDER Database, 2018. [Online]. Available: https://wonder.cdc.gov/mcd.html
[4] Seth S. Martin et al., “2024 Heart Disease and Stroke Statistics: A Report of US and Global Data from the American Heart Association,” Circulation, vol. 149, no. 8, pp. e347-e913, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Adel Bashatah, Wajid Syed, and Mohmood Basil A. Al-Rawi, “Knowledge of Cardiovascular Disease Risk Factors and Its Primary Prevention Practices Among the Saudi Public - A Questionnaire-Based Cross-Sectional Study,” International Journal of General Medicine, vol. 16, pp. 4745-4756, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Maedeh Amini, Farid Zayeri, and Masoud Salehi, “Trend Analysis of Cardiovascular Disease Mortality, Incidence, and Mortality-To Incidence Ratio: Results from Global Burden of Disease Study 2017,” BMC Public Health, vol. 21, no. 1, pp. 2-12, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Mithun Sarker, “Revolutionizing Healthcare: The Role of Machine Learning in the Health Sector,” Journal of Artificial Intelligence General Science, vol. 2, no. 1, pp. 36-61, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Tom M. Mitchell, Machine Learning, McGraw-Hill, 2017.
[Publisher Link]
[9] Rüstem Yılmaz, and Fatma Hilal Yagin, “Early Detection of Coronary Heart Disease Based on Machine Learning Methods,” Medical Records, vol. 4, no. 1, pp. 1-6, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Juan-Jose Beunza et al., “Comparison of Machine Learning Algorithms for Clinical Event Prediction (Risk of Coronary Heart Disease),” Journal of Biomedical Informatics, vol. 97, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[11] S. Prabu et al., “Grid Search for Predicting Coronary Heart Disease by Tuning Hyper-Parameters,” Computer Systems Science and Engineering, vol. 43, no. 2, pp. 737-749, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Kelvin Kwakye, and Emmanuel Dadzie, “Machine Learning-Based Classification Algorithms for the prediction of CHD,” arXiv Preprint, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Amanda H. Gonsalves et al., “Prediction of Coronary Heart Disease Using Machine Learning: An Experimental Analysis,” Proceedings of the 3rd International Conference on Deep Learning Technologies, Xiamen, China, pp. 51-56, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[14] K. Nirmala Devi, S. Suruthi, and S. Shanthi, “Coronary Artery Disease Prediction using Machine Learning Techniques,” 8th International Conference on Advanced Computing and Communication Systems, Coimbatore, India, pp. 1029-1034, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Nur Silviyah Rahmi et al., “SMOTE Classification and Random Oversampling Naive Bayes in Imbalanced Data: (Case Study of Early Detection of Cervical Cancer in Indonesia),” IEEE 7th International Conference on Information Technology and Digital Applications (ICITDA), Yogyakarta, Indonesia, pp. 1-6, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Walaa Adel Mahmoud, Mohamed Aborizka, and Fathy Amer, “Heart Disease Prediction Using Machine Learning and Data Mining Techniques: Application of Framingham Dataset,” IDOSR Journal of Computer and Applied Sciences, vol. 6, no. 1, pp. 66-73, 2021.
[Google Scholar] [Publisher Link]
[17] Nitesh V. Chawla et al., “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Chintan M. Bhatt et al., “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, pp. 1-14, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Devansh Shah, Samir Patel, and Santosh Kumar Bharti, “Heart Disease Prediction using Machine Learning Techniques,” SN Computer Science, vol. 1, no. 6, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Al-Zadid Sultan Bin Habib, and Tanpia Tasnim, “An Ensemble Hard Voting Model for Cardiovascular Disease Prediction,” 2nd International Conference on Sustainable Technologies for Industry 4.0, Dhaka, Bangladesh, pp. 1-6, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Haleh Ayatollahi, Leila Gholamhosseini, and Masoud Salehi, “Predicting Coronary Artery Disease: A Comparison between Two Data Mining Algorithms,” BMC Public Health, vol. 19, no. 1, pp. 1-9, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[22] G. Ambrish, “Logistic Regression Technique for Prediction of Cardiovascular Disease,” Global Transitions Proceedings, vol. 3, no. 1, pp. 127-130, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Abid Ishaq et al., “Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques,” IEEE Access, vol. 9, pp. 39707-39716, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Ashish Bhardwaj, Framingham Heart Study Dataset, Kaggle, 2022. https://www.kaggle.com/datasets/aasheesh200/framingham-heart-study-dataset/
[25] Manu Siddhartha, Heart Disease Dataset (Comprehensive), 2019. [Online]. [Online]. Available: Available: https://www.kaggle.com/datasets/sid321axn/heart-statlog-cleveland-hungary-final
[26] Yu Lin Hsu, Z-Alizadeh Sani Dataset (2).Csv, Kaggle, 2018. [Online]. Available:https://www.kaggle.com/datasets/tanyachi99/zalizadeh sani-dataset-2csv