نوع مقاله : پژوهشی
نویسندگان
1 گروه مهندسی منابع طبیعی، دانشکدة علوم کشاورزی و منابع طبیعی، دانشگاه هرمزگان، بندر عباس، ایران
2 دانشیار مهندسی منابع طبیعی و ژئومرفولوژی، دانشکده کشاورزی و منابع طبیعی، دانشگاه هرمزگان، بندرعباس، ایران
3 گروه فضای سبز، دانشکدة جغرافیا و برنامه ریزی محیطی، دانشگاه سیستان و بلوچستان، زاهدان، ایران
چکیده
کلیدواژهها
عنوان مقاله [English]
نویسندگان [English]
Air pollution, especially the increase in the concentration of PM2.5 particles, has been raised as one of the major environmental challenges in recent years. Due to its specific geographical location, which is located in the path of the 120-day Sistan winds, along with the decrease in annual rainfall, Sistan and Baluchestan province provides ideal conditions for the formation and intensification of dust phenomena. In this regard, teleconnections play an important role in climate change and, consequently, in air quality. The main objective of this research is to predict the impact of teleconnection indices on PM2.5 changes in Sistan and Baluchestan province using advanced machine learning models. For this purpose, meteorological data and PM2.5 concentrations were collected from Zahedan and Khash stations over two decades and combined with teleconnection indices. Then, using correlation analysis and feature selection methods, five machine learning models were evaluated to identify the best model for long-term forecasting. This study not only contributes to a better understanding of the complex relationships between climate variability and air quality, but also provides a practical tool for policymakers in air pollution management by providing a detailed analytical framework.
Materials and Methods
Research Methodology This study used a comprehensive multi-stage analytical framework in which meteorological data and PM2.5 particle concentration were collected from Zahedan and Khash stations during the period 2000 to 2021 and supplemented with NOAA Climate Prediction Center remote sensing index data. After performing careful data preprocessing including quality control, temporal synchronization, and missing data replacement, a dual analytical approach was implemented: first, Pearson correlation analysis was used to measure linear relationships between remote sensing indices and PM2.5 levels, and then the Boruta algorithm identified the most effective features at time lags of 0 to 6 months. Five advanced machine learning models including Bagged CART, LightGBM, Gradient Boosting, Random Forest, and XGBoost were evaluated, with 70% of the data used for model training and the rest for validation. Performance evaluation was performed using three criteria: root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R²). For interpretability of the models, four advanced techniques were used, including permutation feature importance (PFI), SHAP values based on game theory, Sobol sensitivity analysis, and partial dependency diagrams. All analyses were performed in the R software environment.
Results and Discussion
The results of this study showed that the remote linkage indices have a significant effect on the concentration of PM2.5 particles at Zahedan and Khash stations. At Zahedan station, the PDO index showed the highest positive correlation (0.158 with a 5-month lag) and the AMO index showed the highest positive effect (0.212 with a 0-month lag). In contrast, the AMM index had the highest negative correlation (-0.336 with a 2-month lag) and the WHWP index had the strongest negative effect (-0.420 with a 4-month lag). At Khash station, the PDO index showed the highest positive correlation (0.159 with a 2-month lag) and the WHWP index showed the highest negative effect (-0.385 with a 4-month lag). The feature importance analysis with Boruta method showed that WHWP has the most predictive role for PM2.5, with an average importance score of 13.63 at 6-month lag in Zahedan and 10.51 at 5-month lag in Khash. In the evaluation of the models, XGBoost was identified as the superior model, performing with exceptional accuracy (R²=0.989 in Zahedan and R²=0.993-0.994 in Khash) and minimal error (MAPE=2.36-3.07 in Zahedan and MAPE=1.5-1.8 in Khash). Sensitivity analyses showed that AMM has the most overall impact, with an importance score of 685 in Zahedan and 561 in Khash, while WHWP and AMO indices showed complex nonlinear behaviors at specific lag times. Collectively, these findings indicate that ocean-atmosphere oscillations have a significant impact on regional air quality, with correlation coefficients ranging from 0.15 to 0.42 and significance scores ranging from 5.6 to 13.6. The outstanding performance of the XGBoost model indicates its strong potential for long-term PM2.5 forecasting applications in the study region.
Conclusion and Suggestions
The results of the correlation analysis showed that the PDO and AMO indices had the most positive effect on PM2.5 concentration in Zahedan, while the AMM and WHWP indices had a negative effect at this station. The findings of the Boruta method confirm that the WHWP and AMM indices play a key role in predicting PM2.5 at specific time lags, with WHWP having the most impact at longer time lags (4 to 6 months). In the modeling section, XGBoost was identified as the best model with high accuracy and the least error. SHAP, Sobol and PDP analyses showed that the Atlantic-related indices (AMM and AMO) have a dominant effect in Zahedan, while the nonlinear behavior of indices such as WHWP in certain ranges of values leads to sudden changes in the forecasts. At Khash station, PDO index showed the highest positive correlation and WHWP the highest negative impact, indicating the significant impact of oceanic oscillations on air quality. Feature importance analysis using Boruta method at Khash station showed that AMM, AMO, PDO and WHWP indices play a key role in predicting PM2.5, while Tropical Northern Atlantic Index (TNA) and WP indices were rejected due to low significance. In the modeling section, XGBoost was identified as the best model. PDP analyses showed that climate indices have nonlinear and complex effects on PM2.5, such that AMM shows oscillatory behavior at different lags and WHWP causes a sudden decrease in PM2.5 concentration at a lag of 5 months.
کلیدواژهها [English]