Locating Lead Pipes using Predictive Modeling in Flint, MI 

City:

Flint, Michigan, United States

Organization:

Research team (Georgia tech, University of Michigan, BYU) collaborated with Flint City Officials & FAST Start 

Project Start Date:

June 2016

Project End Date:

October 2017

Reference:

Abernethy J, Chojnacki A, Farahi A, Schwartz E, & Webb J. 2018. ActiveRemediation: The Search for Lead Pipes in Flint, Michigan. In KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, August 19–23, 2018, London, United Kingdom. ACM, New York, NY, USA, Article 4, 10 pages. https://doi.org/10.1145/3219819. 3219896 

Madrigal A. “How a Feel-Good AI Story Went Wrong in Flint.” The Atlantic, Atlantic Media Company, January 5, 2019. www.theatlantic.com/technology/archive/2019/01/how-machine-learning-found-flints-lead-pipes/578692/.

Problem:

In 2014, the City of Flint switched water sources to a local river that they were planning to treat at a new plant. However, water was not treated properly, causing the water to leach lead into water source. This resulted in one of the worst public health disasters in contemporary American politics. The best solution was to replace the water pipes.


The City of Flint did not have reliable records on where the lead pipes were located and it is very costly to dig up pipes and test to see if they are made of hazardous materials. While a less costly pipe identification system was identified (HVAC inspection), there was still a need to minimize the cost of pipe inspection. Ultimately, they needed to learn whether a home’s water pipes should be inspected, and the method to be used, based on the monetary budget available. 

Technical Solution:

Team developed a framework called ACTIVE REMEDIATION that lays out a data driven approach to replace hazardous water infrastructure at a large scale. This framework consisted of:

  1. STATISTICAL MODEL
    • purpose: assigns a probability that a service line contains hazardous materials 
    • model type: combination of predictive model (XGBoost classification tree) & hierarchical Bayesian spatial model
      • Also tested random tree, lasso  
    • inputs: known parcel features (i.e., house age) and known/labeled homes are used for predictive model; outputs of predictive model (probabilities that service lines are hazardous material type) are parameters for Bayesian model. 
      • Known/labeled homes were pulled from Data Collection App, which allowed the model to continually be estimating the best homes for inspection/replacement by including the most current data available. 
    • Bayesian model relieves some of the spatial forces at play in classification tree by pulling parameters in precincts with little information towards the city-wide distribution. 
    • output: spatially adjusted probability that a home has hazardous pipe material 
  2. INSPECTION/REPLACEMENT DECISION RULE
    • purpose: allocates scarce monetary resources to find and replace hazardous service lines using various methods of inspection 
    • model type: Importance Weighted Active Learning (IWAL(0.7))
    • inputs: probability that a house has hazardous pipe material (output from STATISTICAL MODEL)
    • output: decision of which homes should be inspected in the next round of inspections and what type of inspection should be performed

Datasets Used:

  • Dataset 1: Parcel Data, City of Flint, 2015
  • Dataset 2: Records of Service Lines, City of Flint, 2015
  • Dataset 3: MDEQ sample inspection (n=3,000), 2016
  • Dataset 4/5: Pilot and Phase One pipe replacements 2016
  • Dataset 6: Contractor Data Collection (via App), Contractors, 2016 (continuously updated)

Outcome:

Before Model Implementation

Note: High-risk homes where there were vulnerable residents (pregnant people, children) and observed high lead levels were selected for these phases.

Pilot: 33 of 36 homes had hazardous pipes (all pipes were dug up for replacement) 

Phase One: 165 of 171 homes needed replacement (96% where City records indicated only 40%)


Post-Model Implementation Results

Theoretical, based on ACTUAL FLINT, a set of 6,506 homes that had already been inspected/replaced and were in Data Collection App

Hit-rate: unnecessary replacement visits reduced from 18.8 percent to 2 percent (hit-rate of 98 percent) 

Cost Savings: 10.7% of funds per successful replacement saved 

Actual implementation, reported in Atlantic Article

Hit-rate above 80 percent near end of project 

Issues that arose:

  • Model does not account for logistics of crew deployment.
  • Original home inspections were biased, as they were selected because they were high-risk homes. 
  • Authors admit that while their model was used in early stages of home inspections, especially for HVAC inspection selection, it had less impact on replacement decisions as these choices carried greater political and logistical challenges.

Status:

Terminated

Entered by:

Amber DeJohn, amber.dejohn@mail.utoronto.ca



CEM1002, 

Civil Engineering, University of Toronto 

Contact: msf@eil.utoronto.ca