Mapping landslide susceptibility and types using Random Forest

City:	Piedmont, Italy
Organization:	SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College London, UK
Project Start Date:	Unspecified
Project End Date:	April 3, 2018 (First Received)
Reference:	Taalab, K., Cheng, T., & Zhang, Y. (2018). Mapping landslide susceptibility and types using Random Forest. Big Earth Data, 2(2), 159-178. doi:10.1080/20964471.2018.1472392
Problem:	Background Problem: · Piedmont, Italy is a landslide “hotspot” · Landslides are a natural hazard with significant ramifications such as destruction of properties, irregular landscape morphologies, and casualties. · Small-scale landslide susceptibility maps (LSMs) are produced to potentially predict a singular type and location of landslide across a region, which are often labour-intensive and inefficient. Objective: · The study aims to use Random Forest (RF) to determine if data mining LSMs can be expanded to accommodate a large, heterogeneous area that amasses varying landslide typologies without losing specific details pertaining to each landslide type. · The creation of comprehensive, heterogeneous LSMs that account for both susceptibility and type of landslide are crucial to enhance urban planning decisions, environmental management, and minimize economic losses, because they can be easily interpreted.
Technical Solution:	Supervised Learning Model: The heterogeneous LSMs will be produced using Random Forest. Technical Solution Framework: Train a binary RF classifier to predict landslide susceptibility; Train a multi-class RF classifier to predict the landslide types. Variable Selection: Dependent Variables · Landslide Susceptibility · Type of Landslide (Landslide Classes) o Crash/roll over o Sliding rotational/translational o Slow dripping o Fast dripping o Complex o DGPV o Collapsing/overturning areas o Widespread shallow o Multiple Independent (Control) Variables · Landslide Prone Areas (Binary Classification) · Geomorphological Attributes: o DEM – Digital Elevation Model o Slope o Aspect o Curvature o Profile Curvature o Plan Curvature o Parent Material Lithology o Land Form o TWI – Topographic Wetness Index o Land Cover o Distance from Road o Distance from River o Average Annual Rainfall o Hydrological Complex Data Preparation: Landslide Classification Model: · Because not all landslides were categorized in the case study, a subset of 236 715 samples were divided into two datasets, training (165 698) and validation (71 016) datasets, to train and test the classification model. Grid Solution Sampling: · 100m resolution grid cell format – defined by the authors as the most ideal grid size to ensure more accurate LSM classification to aid decision-making. o Landslide areas – sampled at a 100m grid solution within the boundaries of former landslides found in the inventory. o Non-landslide areas – also determined at a 100m grid in which were randomly selected within the study area (excluding areas within 200m adjacent to existing landslide areas). Confusion Matrix: · The authors used the confusion matrix to assess the accuracy of the maps (recommended by Kavzoglu et al., (2014)). Random Forest Model: · The authors agreed that 200 trees are deemed appropriate to produce stable models, based on their comparative analysis of other experimental results, where: o A single decision tree was identified as a weak classifier because it is associated with high variance or high bias. o Random Forest RF initiates a balance between sources of error by using a large quantity of decision trees. Data Analysis: Random Forest LSMs: Binary · RF LSMs were used by the authors to demonstrate a binary classification to test areas of landslide susceptibility. o Black – indicates that the region is classified as susceptible to landslides. o Grey – indicates that the region is NOT susceptible to landslides. · Then, RF predicts the probability of class in which a continuous scale from high (1) to low (0) probability of susceptibility is applied. o Threshold – susceptibility > 0.5 · The test dataset for the binary classification model had an overall classification accuracy of approximately 88%. Random Forest LSMs: Multi-class · If susceptibility > 0.5, then, the multi-class RF classifier is applied to derive the types of landslides in addition to corresponding it with the ranking of the predictor (control) variables in order of importance dependent on their relative contribution to the classification accuracy of the model. · The test dataset for the multi-class classification model had an overall classification accuracy of approximately 77.26%.
Datasets Used:	Landslide Inventory (30 439 landslides), SiFRAP, early 20^th century to 2006
Outcome:	Pre-Solution Performance: · Traditional LSMs are typically small-scale and only demonstrate the spatial likelihood of a singular landslide occurrence. · Applying the traditional LSM model at larger scales result in various inconsistencies and challenges including loss of relevant details of each landslide type and/or is extremely resource-consumptive. For example: o Conducting separate analysis for each type of landslide and grouping it to predict a total susceptibility is labour-intensive. o Treating landslides as a single class result in lost details and inaccuracy as distinct types of landslides are often related to the same geomorphological conditioning factors. · Even relatively newer data mining methods used to produce enhanced LSMs like artificial neural networks, support vector machines and decision were still inefficient in the sense that it is only effective at small-medium size (<5000 km²) scales or a particular type of landslide classes. Post Solution Performance: · The RF LSMs provide an effective combination of susceptibility mapping and landslide classification, directly addressing the former challenges associated with large-scale landscape susceptibility maps in a region that is exposed to numerous types of landslides. · The RF LSMs allow for the establishment of highly accurate susceptibility maps for a large heterogeneous region without having to produce various susceptibility maps. · The comprehensive LSMs are simpler to present to planners and decision-makers given that it contains a large amount of information that is easily interpretive. · Instead of referring to labour-intensive methods like the two-stage modeling process, the single RF model is relatively more efficient in detecting landslide susceptibility.
Issues that arose:	The major source of error refers to the landslide prone areas being classified as non-susceptible. This is generally assumed because various landslides are situated on mountainous sites, which are commonly highly susceptible to landslide occurrence despite the lack of previously recorded landslides in the area. Therefore, the primary research challenge for this case study is regarding the approach on how to sample non-landslide areas and extract several non-susceptible samples using an optimal method. Another limitation is associated with Random Forest because it tends to rank variables in order of importance and provides limited opportunities to determine the process that the variables represent.
Status:	Operational: The use of Random Forest supervised modelling to develop large-scale, heterogeneous LSMs was highly successful and efficient in predicting susceptibility and landslide class in which can be applicable to regional- or national-scale projects as opposed to the traditional small-scale LSMs. The overall classification accuracy for the RF LSM model is over 88% in which is optimal compared to other data-mining studies using other approaches for LSMs that are usually 70-80% accurate.
Entered by:	November 5, 2021: Reham Ebrahim Ali, reham.ebrahimali@mail.utoronto.ca.

CEM1002,
Civil Engineering, University of Toronto
Contact: msf@eil.utoronto.ca