Estimating Groundwater Nitrate Contamination Probability Using Extreme Gradient Boosting

Document Type : Research

Authors

1 Assistant Professor, Soil conservation and watershed management research department, Isfahan Agricultural and Natural Resources Research and Education Center, AREEO, Isfahan, Iran

2 Assistant Professor, Soil conservation and watershed management research department, Khorasan Razavi Agricultural and Natural Resources Research and Education Center, AREEO, Mashhad, Iran

10.22092/wmrj.2025.370560.1631

Abstract

Introduction and Goal
Groundwater, as a vital source of fresh water, plays a fundamental role in supplying drinking, agricultural, and industrial needs in many arid and semi-arid regions of the world. However, increased human and industrial activities have led to the exacerbation of pollution in these valuable resources. In this regard, nitrate pollution, due to its high solubility and mobility in water, is recognized as one of the most serious threats to human health and aquatic ecosystems. The consumption of nitrate-contaminated water can lead to various diseases, including methemoglobinemia (blue baby syndrome) in infants and even some cancers in adults. Furthermore, the entry of nitrates into surface waters can result in eutrophication and the degradation of aquatic ecosystems. Given the importance of the issue and the necessity of protecting groundwater resources, this research was conducted with the aim of developing an integrated and comprehensive framework for estimating the probability of groundwater contamination, especially with a focus on nitrate contaminant, in the Lenjanat Plain region located in Isfahan Province, Iran. Using this framework and employing advanced modeling and spatial analysis approaches, areas prone to contamination were identified, which will help in providing effective management solutions to reduce the risks associated with groundwater contamination. The results of this research can serve as a basis for future planning in the sustainable management of water resources and the protection of community health.
Materials and Methods
In this study, data related to the nitrate concentration in groundwater sources were carefully examined. Therefore, crucial information was collected from 102 wells in the Lenjanat Plain of Isfahan Province. Each of these wells represented the nitrate status in the groundwater aquifers of the studied region. To analyze this large volume of data and extract hidden patterns, the Extreme Gradient Boosting model was used. This model was chosen due to its high capability in identifying complex and non-linear relationships between variables, as well as its acceptable prediction precision. In addition to nitrate concentration data, ten key environmental and anthropogenic factors potentially influencing nitrate contamination in groundwater were identified and incorporated into the analytical model. These factors included slope, elevation, drainage density, topographic wetness index, soil order, and distance from streams, lithology, and land-use. By integrating these eight factors into the Extreme Gradient Boosting model, it was possible to identify the most significant factors affecting nitrate contamination and also to spatially predict the probability of nitrate contamination in groundwater.
Results and Discussion
The results of this study clearly demonstrated the effectiveness and efficiency of the Extreme Gradient Boosting in predicting nitrate contamination in groundwater. The overall accuracy of this model was 0.86 which allowed the contamination status of the studied area to be well distinguished. In addition, other performance evaluation criteria of the model also indicated its high accuracy in correctly identifying contaminated and uncontaminated areas; with the area under the ROC curve was equal to 0.85. Moreover, the model recall was found to be 0.80, indicating that 80% of all the real contaminated areas were correctly identified using this model. Finally, the F1-score statistic, which is a combined measure of precision and recall, with a value of 0.83, indicates a good balance between these two measures and the overall reliable performance of the model. The sensitivity analysis of the model revealed that the effect of certain input variables on the spatial estimation of nitrate contamination in groundwater was significant. Among the ten environmental and anthropogenic factors examined, precipitation (21%) and elevation changes (18%) were identified as the most influential and important variables in determining the spatial pattern of nitrate contamination. These findings highlight the importance of natural and geomorphological characteristics of the region in controlling the dispersion and accumulation of nitrates in groundwater and can serve as a useful guide for future studies and the development of targeted management strategies.
Conclusion and Suggestions
One of the important achievements of this study was the production of hazard maps that clearly identified areas with high risk of nitrate contamination in the central part of the studied plain. It is recommended that water resource managers and urban and rural planners use these maps as a valuable tool for taking preventive measures in sensitive areas. Notably, the role of human activities in increasing the risk of nitrate contamination was strongly confirmed by the significant overlap of high-risk areas with agricultural land-use. Based on these findings, it is suggested that nitrogen fertilizers be used optimally for the protection of groundwater resources and the sustainable management of agricultural activities.

Keywords

Main Subjects


Agyemang, ABA. 2017. Vulnerability assessment of groundwater to NO3 contamination using GIS, DRASTIC model and geostatistical analysis. East Tennessee State University. p.13830146.
Alam SK, Li P, Rahman M, Fida M, Elumalai V. 2025. Key factors affecting groundwater nitrate levels in the Yinchuan Region, Northwest China: Research using the eXtreme Gradient Boosting (XGBoost) model with the SHapley Additive exPlanations (SHAP) method. Environmental Pollution. 364(1): p.125336. https://doi.org/10.1016/j.envpol.2024.125336
Aller L, Bennett T, Lehr J, Petty RJ, Hackett G. 1987. DRASTIC: A standardized system for evaluating ground water pollution potential using hydrogeologic settings. US Environmental Protection Agency. Washington, DC, 455.
Almasri MN, Kaluarachchi JJ. 2007. Modeling nitrate contamination of groundwater in agricultural watersheds. Journal of Hydrology. 343(3-4):211–229. https://doi.org/10.1016/j.jhydrol.2007.06.016
Arauzo M, 2017. Vulnerability of groundwater resources to nitrate pollution: A simple and effective procedure for delimiting Nitrate Vulnerable Zones. Science of the Total Environment. 575:799–812. https://doi.org/10.1016/j.scitotenv.2016.09.139
Beven KJ, Alcock RE. 2012. Modelling everything everywhere: A new approach to decision‐making for water management under uncertainty. Freshwater Biology. 57:124–132. https://doi.org/10.1111/j.1365-2427.2011.02592.x
Chen C, Yin C, Wang Y, Zeng J, Wang S, Bao Y, Liu X. 2023. XGBoost-based machine learning test improves an accuracy of hemorrhage prediction among geriatric patients with long-term administration of rivaroxaban. BMC Geriatrics. 23(1): p. 418. https://doi.org/10.1186/s12877-023-04049-z
Choubin B, Moradi E, Golshan M, Adamowski J, Sajedi-Hosseini F, Mosavi A. 2019. An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Science of the Total Environment. 651:2087–2096. https://doi.org/10.1016/j.scitotenv.2018.10.064
Covatti G, Li KY, Podgorski J, Winkel LH, Berg M. 2025. Nitrate contamination in groundwater across Switzerland: Spatial prediction and data-driven assessment of anthropogenic and environmental drivers. Science of the Total Environment. 973(10): p.179121. https://doi.org/10.1016/j.scitotenv.2025.179121
Creed IF, Band LE. 1998. Exploring functional similarity in the export of Nitrate‐N from forested catchments: A mechanistic modeling approach. Water Resources Research. 34(11):3079–3093. https://doi.org/10.1029/98WR02102
Dhaliwal JK, Panday D, Saha D, Lee J, Jag Adamma S, Schaeffer S, Mengistu A. 2022. Predicting and interpreting cotton yield and its determinants under long-term conservation management practices using machine learning. Computers and Electronics in Agriculture. 199: p.107107. https://doi.org/10.1016/j.compag.2022.107107
Gad M, Gaagai A, Agrama AA, El-Fiqy W F, Eid MH, Szűcs P, Ibrahim H. 2024. Comprehensive evaluation and prediction of groundwater quality and risk indices using quantitative approaches, multivariate analysis, and machine learning models: An exploratory study. Heliyon. 10(17): p. e36606. https://doi.org/10.1016/j.heliyon.2024.e36606
Gholami V, Booij MJ. 2022. Use of machine learning and geographical information system to predict nitrate concentration in an unconfined aquifer in Iran. Journal of Cleaner Production. 360: p. 131847. https://doi.org/10.1016/j.jclepro.2022.131847
Gurdak JJ, Qi SL. 2012. Vulnerability of recently recharged groundwater in principle aquifers of the United States to nitrate contamination. Environmental Science and technology. 46(11):6004–6012. https://doi.org/10.1021/es300688b
Hosseini FS, Choubin B, Bagheri‐Gavkosh M, Karimi O, Taromideh F, Mako C. 2023. Susceptibility assessment of groundwater nitrate contamination using an ensemble machine learning approach. Groundwater. 61(4):510–516. https://doi.org/10.1111/gwat.13258
Hosseini FS, Choubin B, Bagheri‐Gavkosh M, Karimi O, Taromideh F, Mako C. 2023. Susceptibility assessment of groundwater nitrate contamination using an ensemble machine learning approach. Groundwater. 61(4):510–516. https://doi.org/10.1111/gwat.13258
Imani M, Beik Mohammadi A, Arabnia HR. 2025. Comprehensive analysis of random forest and XGBoost performance with SMOTE, ADASYN, and GNUS under varying imbalance levels. Technologies. 13(3): 88. https://doi.org/10.3390/technologies13030088
Interagency Agricultural Projections Committee. 2023. USDA Agricultural Projections to 2032. p. 119. https://doi.org/10.22004/ag.econ.348090
International Agency for Research on Cancer, IARC. 2010. Ingested nitrate and nitrite, and cyanobacterial peptide toxins. International Agency for Research on Cancer. https://cir.nii.ac.jp/crid/1970867909768062125
Knoll L, Breuer L, Bach M. 2020. Nation-wide estimation of groundwater redox conditions and nitrate concentrations through machine learning. Environmental Research Letters. 15(6):p.064004. https://doi.org/10.1088/1748-9326/ab7d5c
Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B. Team RC, 2020. Package ‘caret’. The R Journal. 223(7):1–48.
Liang Y, Zhang X, Gan L, Chen S, Zhao S, Ding J, Wulue K, Yang H. 2024. Mapping specific groundwater nitrate concentrations from spatial data using machine learning: A case study of chongqing, China. Heliyon. 10(6): p.e27867. https://doi.org/10.1016/j.heliyon.2024.e27867.
Luo S, El X, Li X. 2024. Data Preprocessing Method for Landslide Displacement Prediction Based on XG Boost. In 2024 IEEE 13th Data Driven Control and Learning Systems Conference (DDCLS). IEEE. pp. 745–750. https://doi.org/10.1109/DDCLS61622.2024.10606761
Messier KP, Wheeler DC, Flory AR, Jones RR, Patel D, Nolan BT, Ward M H. 2019. Modeling groundwater nitrate exposure in private wells of North Carolina for the Agricultural Health Study. Science of The Total Environment. 655:512–519. https://doi.org/10.1016/j.scitotenv.2018.11.022
Mosavi A, Sajedi Hosseini F, Choubin B, Goodarzi M, Dineva AA, Rafiei Sardooi E. 2021. Ensemble boosting and bagging based machine learning models for groundwater potential prediction. Water Resources Management. 35(1):23–37. https://doi.org/10.1007/s11269-020-02704-3
Naseri HR, Key HZ, Nakhai M. 2012. The impact of natural and human factors on water quality in Lenjanat Plain, Isfahan. Geosciences. 22(85):173–186. (In Persian).
Ouedraogo I, Vanclooster M. 2016. A meta-analysis and statistical modelling of nitrates in groundwater at the African scale. Hydrology and Earth System Sciences. 20(6):2353–2381. https://doi.org/10.5194/hess-20-2353-2016
Picetti R, Deeney M, Pastorino S, Miller MR, Shah A, Leon DA, Green R. 2022. Nitrate and nitrite contamination in drinking water and cancer risk: A systematic review with meta-analysis. Environmental Research. 210:p.112988. https://doi.org/10.1016/j.envres.2022.112988
Rahimi D, Bashirian F, Nourbakhsh A. 2024. Assessment of climate change impacts on water resources (Lenjanat Sub-basin). Natural Geography. 64(17): 63–78. (In Persian).
Rahmati O, Choubin B, Fathabadi A, Coulon F, Soltani E, Shahabi H, Mollaefar E, Tiefenbacher J, Cipullo S, Ahmad BB, Bui DT. 2019. Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNEEC methods. Science of the Total Environment. 688:855–866. https://doi.org/10.1016/j.scitotenv.2019.06.320
Ransom KM, Nolan BT, Stackelberg PE, Belitz K, Fram MS. 2022. Machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous United States. Science of the Total Environment. 807:p.151065. https://doi.org/10.1016/j.scitotenv.2021.151065
Sajedi-Hosseini F, Malekian A, Choubin, B, Rahmati O, Cipullo S, Coulon F, Pradhan B. 2018. A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Science of the Total Environment. 644:954–962. https://doi.org/10.1016/j.scitotenv.2018.07.054
Sarkar S, Mukherjee A, Gupta SD, Bhanja SN, Bhattacharya A. 2022. Predicting regional-scale elevated groundwater nitrate contamination risk using machine learning on natural and human-induced factors. Acs Es and T Engineering. 2(4):689–702. https://doi.org/10.1021/acsestengg.1c00360
Shukla S, Saxena A. 2021. Appraisal of groundwater quality with human health risk assessment in parts of Indo-Gangetic alluvial plain, North India. Archives of Environmental Contamination and Toxicology. 80(1):55–73. https://doi.org/10.1007/s00244-020-00771-6
Ullah S, Zhang H, Heathwaite AL, Heppell C, Lansdown K, Binley A, Trimmer M. 2014. Influence of emergent vegetation on nitrate cycling in sediments of a groundwater-fed river. Biogeochemistry. 118(1):121–134. https://doi.org/10.1007/s10533-013-9909-2
United Nations Environment Programme, UNEP. 2023. An introduction to SDG indicator 6.3.2: proportion of bodies of water with good ambient water quality. https://communities.unep.org/display/sdg632/Documents+and+Materials
Wang ZJ, Yue FJ, Lu J, Wang YC, Qin CQ, Ding H, Xue LL, Li SL. 2022. New insight into the response and transport of nitrate in karst groundwater to rainfall events. Science of the Total Environment. 818:p.151727. https://doi.org/10.1016/j.scitotenv.2021.151727
Ward MH, Jones RR, Brender JD, De Kok TM, Weyer PJ, Nolan BT, Van Breda SG. 2018. Drinking water nitrate and human health: an updated review. International Journal of Environmental Research and Public Health. 15(7):1557. https://doi.org/10.3390/ijerph15071557
Wiens M, Verone‐Boyle A, Henscheid N, Podichetty JT, Burton J. 2025. A tutorial and use case example of the extreme gradient boosting (XGBoost) artificial intelligence algorithm for drug development applications. Clinical and Translational Science. 18(3):p.e70172. https://doi.org/10.1111/cts.70172
Wilkins B, Johns T, Mager S. 2024. Nitrate-nitrogen dynamics in response to forestry harvesting and climate variability: Four years of UV nitrate sensor data in a shallow, gravel aquifer. EGUsphere. 2024:1–27. https://doi.org/10.5194/egusphere-2024-964
World Health Organization (WHO). 2011. Guidelines for Drinking-water Quality. 38(4):1–108.
World Health Organization (WHO). 2022. Guidelines for drinking-water quality: Fourth edition incorporating the first and second addenda. World Health Organization. p. 494.
Xu T, Gómez‐Hernández JJ. 2016. Joint identification of contaminant source location, initial release time, and initial solute concentration in an aquifer via ensemble Kalman filtering. Water Resources Research. 52(8):6587–6595. https://doi.org/10.1002/2016WR019111
Zhang P, Jia Y, Shang Y. 2022. Research and application of XGBoost in imbalanced data. International Journal of Distributed Sensor Networks. 18(6):p.15501329221106935
Zhang W, Wang W, Zhou D, Zhang, R, Goh ATC, Hou Z. 2018. Influence of groundwater drawdown on excavation responses–A case history in Bukit Timah granitic residual soils. Journal of Rock Mechanics and Geotechnical Engineering. 10(5):856-864. https://doi.org/10.1016/j.jrmge.2018.04.006