نوع مقاله : پژوهشی
نویسندگان
1 دانشجوی دکتری گروه مهندسی منابعطبیعی، دانشکدة کشاورزی و منابعطبیعی، دانشگاه هرمزگان، بندرعباس، ایران
2 دانشیار گروه مهندسی منابعطبیعی، دانشکدة کشاورزی و منابعطبیعی، دانشگاه هرمزگان، بندرعباس، ایران
3 استاد گروه مهندسی منابعطبیعی، دانشکدة کشاورزی و منابعطبیعی، دانشگاه هرمزگان، بندرعباس، ایران
4 دانشیار گروه فضای سبز، دانشکدة جغرافیا و برنامه ریزی محیطی، دانشگاه سیستان و بلوچستان، زاهدان، ایران
چکیده
کلیدواژهها
عنوان مقاله [English]
نویسندگان [English]
Introduction and Goal
In recent years, air pollution, especially the increase in the concentration of PM2.5 particles matter, has been raised as one of the significant environmental challenges. Due to its specific geographical location, which is located in the path of the 120-day Sistan winds, along with the decrease in annual rainfall, Sistan and Baluchestan province provides favorable conditions for the formation and intensification of dust phenomena. In this regard, teleconnections play an important role in climate change and, consequently, in air quality. The main objective of this research is to evaluate the impact of teleconnection indices on PM2.5 concentrations in Sistan and Baluchestan Province using advanced machine learning models. Therefore, meteorological data and PM2.5 concentrations were collected from Zahedan and Khash stations over two decades and combined with teleconnection indices. Then, using correlation analysis and feature selection methods, five machine learning models were evaluated to identify the best model for long-term estimating PM2.5 particulate matter concentrations was identified. The results of this research both led to a better understanding of the complex relationships between climate variability and air quality, by provided a detailed analytical framework, provided a practical tool for policymakers in air pollution management.
Materials and Methods
This study used a comprehensive multi-stage analytical framework, and meteorological data and PM2.5 particle concentration were collected from Zahedan and Khash stations during the period 2000 to 2021 and supplemented with teleconnection index data from the NOAA Climate Prediction Center. After careful data preprocessing, which included quality control (checking for impossible or anomalous PM2.5 measurements and correcting or removing suspicious measurements), data temporal synchronization (matching PM2.5 data and remote linkage indices based on history to ensure the synchronization of independent and dependent measurements), and missing data replacement (using the nearest valid measurement, temporal averaging, and statistical interpolations to preserve the original data distribution), a dual analytical approach was implemented. First, Pearson correlation analysis was used to measure linear relationships between teleconnection index and PM2.5 levels. Then the Boruta algorithm identified the most effective features at time lags of 0 to 6 months. Five advanced machine learning models including Bagged CART, LightGBM, Gradient Boosting, Random Forest, and XGBoost were evaluated, with 70% of the data used for model training and the rest for validation. Performance evaluation was performed using three criteria: root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R²), and for interpretability of the models, four advanced techniques were used, including permutation feature importance (PFI), SHAP values based on game theory, Sobol sensitivity analysis, and partial dependency diagrams (PDP). All analyses were performed in the R software environment (4.2.0).
Results and Discussion
The results showed that the effect of teleconnection indices on PM2.5 particles concentrations at Zahedan and Khash stationswas significant. At Zahedan station, the highest positive correlation was related to the PDO index (0.158 with a 5-month lag) and the AMO index (0.212 with a 0-month lag). On the other hand, the highest negative correlation was related to the AMM index 0.336 with a 2-month lag) and the WHWP index effect (-0.420 with a 4-month lag). At Khash station, the highest positive correlation was related to the PDO index (0.159 with a 2-month lag) and the highest negative correlation was related to WHWP (-0.385 with a 4-month lag). Analysis of the importance of features using the Boruta method showed that the greatest predictive role of PM2.5 was related to the WHWP index with an average importance of 13.63 with a 6-month lag in Zahedan and with an average importance of 10.51 at 5-month lag in Khash. In the evaluation of the models, XGBoost was identified as the best model with exceptional accuracy (R²=0.989 in Zahedan and R²=0.993-0.994 in Khash) and minimal error (MAPE=2.36-3.07 in Zahedan and MAPE=1.5-1.8 in Khash). The results of sensitivity analyses showed that the greatest effect was related to the AMM index (with a significance score of 685 in Zahedan and 561 in Khash). On the other hand, with certain lag times, the behavior of the WHWP and AMO indices were nonlinear and complex. Overall, the results indicate significant effects of ocean-atmosphere oscillations (with correlation coefficients ranging from 0.15 to 0.42 and significance scores ranging from 5.6 to 13.6) on regional air quality. The performance of the XGBoost model in long-term PM2.5 forecasting in the study region was very accurate.
Conclusion and Suggestions
The results of this study showed that the highest positive correlation with PM2.5 concentrations at Zahedan station was related to PDO and AMO indices, and the highest negative correlation was related to AMM and WHWP indices. The XGBoost model was identified as the best prediction model, which had high accuracy with the lowest error. Also, the results of SHAP and PDP analyses showed that the effects of the AMM and WHWP indices on PM2.5 concentrations were complex and nonlinear, and the time lag in these effects was very important. At Khash station, the AMM, AMO, PDO and WHWP indices also played an important role in predicting PM2.5 and indicated the significant effects of climate fluctuations on air quality. These findings indicate the importance of nonlinear relationships and critical thresholds in air quality modeling. Based on the results of this study, it is suggested that remote sensing indices be continuously monitored to predict periods of high particulate matter concentrations for the purpose of preventive decisions and actions. Also, it is suggested that, based on the differences observed between the two stations (Zahedan and Khash), more attention be paid to local and regional characteristics in air quality modeling.
کلیدواژهها [English]