Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea

City:

Seoul, Korea 

Organization:

  • Department of Geoinformatics, University of Seoul, Seoul, Korea 
  • National Institute of Ecology, Seocheon, Korea 
  • Center for Environmental Assessment Monitoring, Environmental Assessment Group, Korea Environment Institute (KEI), Sejong, Korea 
  • Geological Research Division, Korea Institute of Geoscience and Mineral Resources (KIGAM), Daejeon, Korea 
  • Korea University of Science and Technology, Daejeon, Korea 

Project Start Date:

Unknown 

Project End Date:

Mar. 8th, 2017 (Accepted by journal)

Reference:

Sunmin Lee, Jeong-Cheol Kim, Hyung-Sup Jung, Moung Jin Lee & Saro Lee (2017) Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea, Geomatics, Natural Hazards and Risk, 8:2, 1185-1203, DOI: 10.1080/19475705.2017.1308971 

Problem:

Under the global climate change context, the world has suffered significant losses due to abnormal weather phenomena. The frequency and intensity of substantial floods increased into the 21st century, and from a micro-perspective, that is what happened in Seoul. Flood risk map produced by the government is limited to the concept of risk, while a flood susceptibility map based on the sophisticated numerical results in conjunction with the vulnerability to flooding disaster is absent. This study addresses the problem, and the production will serve as a future reference for the government's decision-making. 

Technical Solution:

The study applied random-forest and boosted-tree ML models to conduct spatial prediction of flood susceptibility of Seoul metropolitan city, Korea. The statistical program STATISTICA and GIS program ArcGIS were the primary tools used in this research. The main steps are as the following:

  • Flooded area of 2010 and 2011 in Seoul were extracted, 2010 data is prepared as training data and 2011 is prepared as validation data.
  • Flood relevant factors were collected or calculated from the topographic, land-use, soil and geological maps. In total 12 susceptibility factors were selected.
  • Geospatial data were analyzed and resampled in ArcGIS during the preparation
  • Convert data from ASCII in ArcGIS into STATISTICA format, and applied random-forest and boosted-tree models in the programme. 2010 flooded area was used as training data with factors set as independent variables. For each model, both classification (mode of the classes) and regression model (mean prediction) are performed. 
  • With predictor importance values, significance of each response variable could be computed.
  • Used 2011 flooded area data to validate the results. A receiver operation curve (ROC) was organized with a calculation of the area under the curve (AUC), and AUC was used for the evaluation of the prediction ability of the model. 

Datasets Used:

  • Dataset 1: Digital topographic map, National geographic Information Institute (NGII), Data type of Grid, Scale 1:1,000, contains the following: 
    1. Ground elevation (m)
    2. Gradient - slope
    3. Distance from the river (m)
    4. Slope Length Factor (SLF)
    5. Topographic wetness index (SPI)
    6. Plan curvature
  • Dataset 2: Land-use & land registration map, Korea Ministry of Environment, Data type of Polygon, Scale: 1:1,000, contains the following:
    1. Impermeability layer
    2. Green Infra Farmland
    3. Retarding basin
  • Dataset 3: Detailed soil map, Rural Development Administration, Data type of Polygon, Scale: 1:25,000, contains the following:
    1. Soil drainage
  • Dataset 4: Geological map, Korea Institute of Geoscience & Mineral Resource, Data type of Polygon, Scale 1:25,000, contains the following:
    1. Geology
  • Dataset 5: Hazard, produced by the authors, Data type of Polygon, Scale 1:1,000, contains the following:
    1. Flooded area (both 2010 and 2011)

Outcome:

Four susceptibility maps were produced (Random-forest (Regression), Random-forest (Classification), Boosted tree (Regression), Boosted tree (Classification)). Validation rate by AUC indicated that for random-forest model, achieved 78.78% accuracy in the regression model and 79.18% accuracy in the classification model. While for boosted-tree model, the regression model indicated a 77.55% accuracy and the classification model showed 77.26% accuracy. Every result was considered as satisfactory with accuracies over 75%. 

Issues that arose:

  • For hazard map, because damages caused by floods were changing all the time and tidal structure of the inundation area could not be monitored in real time effectively, it is challenging to spatially analyze the flood inundation.
  • Accurate data of hazard map was conducted within the administrative-district system, and incorrect location information could dramatically affect the results.
  • Water supply and drainage facilities are identified as the factors that might significantly impact flooding but were not included and researched due to difficulty obtaining data.

Status:

Terminated

Entered by:

Nov. 30th, 2020. Jiawei Liu, jiawei.liu@mail.utoronto.ca 



CEM1002, 

Civil Engineering, University of Toronto 

Contact: msf@eil.utoronto.ca