Analyzing Factors Affecting the Water Quality of the Ganges and Predicting Further Patterns

International Journal of Computer Science and Engineering |
© 2025 by SSRG - IJCSE Journal |
Volume 12 Issue 3 |
Year of Publication : 2025 |
Authors : Arush Nath |
How to Cite?
Arush Nath, "Analyzing Factors Affecting the Water Quality of the Ganges and Predicting Further Patterns," SSRG International Journal of Computer Science and Engineering , vol. 12, no. 3, pp. 1-9, 2025. Crossref, https://doi.org/10.14445/23488387/IJCSE-V12I3P101
Abstract:
Observing the degrading water quality in India and the lack of established forecasting methods for the same, this study aimed to use publicly available data from the CPCB website to test different machine learning algorithms and see which one is most viable for prediction of water quality metrics - specifically D.O. This study carried out comparisons using R2 Score between numerous machine learning algorithms such as XGBoost, Support Vector Regressor, Gradient Boosting, Random Forest, and Linear Regression. The study concluded that the most accurate prediction model was Random Forest, with an R2 Score of 0.76, making it a viable means of forecasting future patterns in water quality metrics.
Keywords:
Dissolved Oxygen (D.O.), Machine Learning, CPCB, Hyperparameter, Forecast.
References:
[1] Rajnee Naithani, and I.P. Pande, “Comparative Analysis of the Trends in River Water Quality Parameters: A Case Study of the Yamuna River,” International Journal of Scientific Research Engineering & Technology, vol. 4, no. 12, pp. 1212-1221, 2015.
[Google Scholar] [Publisher Link]
[2] Michael Wiryaseputra, “Banknote Authentication Using Machine Learning Classification Algorithm,” International Journal of Scientific Engineering and Research, vol. 8, no. 1, 2017.
[Publisher Link]
[3] Md. Saikat Islam Khan et al., “Water Quality Prediction and Classification Based on Principal Component Regression and Gradient Boosting Classifier Approach,” Journal of King Saud University – Computer and Information Sciences, vol. 34, no. 8, pp. 4773-4781, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Central Pollution Control Board, Water Quality Criteria, 2019. [Online]. Available: https://cpcb.nic.in/water-quality-criteria/
[5] Stream Water Quality – Importance of Temperature, Know Your H2O, Water Research Center, [Online]. Available: https://www.knowyourh2o.com/outdoor-4/stream-water-quality-importance-of temperature#:~:text=Temperature%20is%20a%20critical%20water,of%20chemical%20and%20biological%20reactions
[6] Dissolved Oxygen, Fondriest Environmental Learning Center. [Online]. Available: https://www.fondriest.com/environmental measurements/parameters/water-quality/dissolved-oxygen/
[7] pH and Water, USGS. [Online]. Available: https://www.usgs.gov/special-topics/water-science-school/science/ph-and water#:~:text=pH%20is%20really%20a%20measure,water%20that%20is%20changing%20chemically
[8] Indicators: Conductivity | US EPA. [Online]. Available: https://www.epa.gov/national-aquatic-resource-surveys/indicators conductivity#:~:text=What%20can%20conductivity%20tell%20us,body%20and%20its%20associated%20biota [9] Biochemical Oxygen Demand in Water Bodies. [Online]. Available: https://www.un.org/esa/sustdev/natlinfo/indicators/methodology_sheets/freshwater/biochemical_oxygen_demand.pdf
[10] 5.7 Nitrates, Monitoring & Assessment. [Online]. Available: https://archive.epa.gov/water/archive/web/html/vms57.html#:~:text=Nitrates%20from%20land%20sources%20end,nitrite%20meth ods;%20APHA,%201992
[11] Coliform Bacteria in Drinking Water, Washington State Department of Health. [Online]. Available: https://doh.wa.gov/community and-environment/drinking water/contaminants/coliform#:~:text=Coliform%20bacteria%20will%20not%20likely,feces%20of%20humans%20or%20animals
[12] Linear Regression in Machine Learning, GeeksforGeeks, 2025. [Online]. Available: https://www.geeksforgeeks.org/ml-linear regression/
[13] Random Forest Algorithm in Machine Learning, GeeksforGeeks, 2025. [Online]. Available: https://www.geeksforgeeks.org/random forest-algorithm-in-machine-learning/
[14] Gradient Boosting in ML, GeeksforGeeks, 2025. [Online]. Available: https://www.geeksforgeeks.org/ml-gradient-boosting/
[15] Neri Van Otten, Support Vector Regression (SVR) Simplified & How to Tutorial in Python, Spot Intelligence, 2024. [Online]. Available: https://spotintelligence.com/2024/05/08/support-vector-regression-svr/
[16] GeeksforGeeks, XGBoost – GeeksforGeeks, 2025. [Online]. Available: https://www.geeksforgeeks.org/xgboost/
[17] Ihechikara Abba, What is R Squared? R2 Value Meaning and Definition, freeCodeCamp.org. [Online]. Available: https://www.freecodecamp.org/news/what-is-r-squared-r2-value-meaning-and-definition/#:~:text=R Squared%20values%20range%20from,50%,%20and%20so%20on
[18] Oluniyi Oluniyi, “Contextual and Ethical Issues with Predictive Process Monitoring,” PhD thesis, University of Westminster School of Computer Science and Engineering Westminster, pp. 1-190, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Will koehrsen, Kaggle. [Online]. Available: https://www.kaggle.com/code/willkoehrsen/intro-to-model-tuning-grid-and-random search