Xgboost random forest

11/24/2023

Tsangaratos and Ilia stated that multicollinearity analysis can be used in order to determine the conditional independence among variables for the feature selection process. As a result, six factors (slope, distance from fault, aspect, lithology, elevation, and settlement density) were selected as the explanatory features that contribute to landslide occurrence for this study area. , were applied to principal component analysis to select significant and independent factors trough 17 contributing factors. Researchers also have to decide the best combination of factors to create the desired prediction model for each study area. Furthermore, in the LSM studies, landslide inventory is also useful to production of models, accuracy assessment of the resulting map and validation of output scores.Īnother challenge faced by researchers is the nature of the Earth is not the same and the factors triggering the landslide is not consistent. For produce reliable and accurate map showing the susceptibility of a particular region to landslide, a prerequisite is to have information regarding the spatial and temporal frequency of landslides. In any type of landslide hazard assessment methodology, there is a need to consider several processes such as landslide inventory map (LIM) production, determination of optimum factors combination, selection of method for the preparation of landslide susceptibility maps (LSMs) and performance analysis. Over the last 2 decades or so, international organizations in disaster management including government and research institutions have been focused on produced assessment methodologies and to portray its spatial distribution in maps. In many parts of the world, natural disasters like landslides are major natural hazard and cause threat to human’s life, economic losses and the environment. The results showed that, the XGBoost method according to optimum model achieved lower prediction error and higher accuracy results than the other ensemble methods. When the Wilcoxon sign-rank test results were analyzed, XgBoost_Opt model, which is the best subset combinations, were confirmed to be statistically significant considering other models. The accuracy results showed that the model of XgBoost_Opt model (the model created by optimum factor combination) has the highest prediction capability (OA = 0.8501 and AUC = 0.8976), followed by the RF_opt (OA = 0.8336 and AUC = 0.8860) and GBM_Opt (OA = 0.8244 and AUC = 0.8796). Also, the Wilcoxon signed-rank test was used to assess differences between optimum models. The performance of the ensemble models was validated using different accuracy metrics including Area under the curve (AUC), overall accuracy (OA), Root mean square error (RMSE), and Kappa coefficient.

Symmetrical uncertainty measure was utilized to determine the most important causative factors, and then the selected features were used to construct susceptibility prediction models. The landslide inventory map was randomly divided into training (70%) and testing (30%) dataset to construct the RF, XGBoost and GBM prediction models. Fifteen landslide causative factors and 105 landslide locations occurred in the region were used.

The main purpose of this study is to produce landslide susceptibility map of the Ayancik district of Sinop province, situated in the Black Sea region of Turkey using three featured regression tree-based ensemble methods including gradient boosting machines (GBM), extreme gradient boosting (XGBoost), and random forest (RF). Thus, selecting a proper ML algorithm help us to understand possible future occurrences by analyzing the past more accurate. Decision tree-based classifier ensemble methods are a machine learning (ML) technique that combines several tree models to produce an effective or optimum predictive model, and that allows well-predictive performance especially compared to a single model.

0 Comments

Xgboost random forest

Leave a Reply.

Author

Archives

Categories