City: |
New York City, New
York, United States |
Organization: |
· Center for Urban
Science and Progress, New York University, United States · New York City
Department of Sanitation, United States |
Project Start Date: |
2017 |
Project End Date: |
November 2017 |
Reference: |
Kontokosta,
C. E., Hong, B., Johnson, N. E., & Starobin, D.
(2018). Computers, Environment and Urban Systems Using machine learning and
small area estimation to predict building-level municipal solid waste
generation in cities. Computers, Environment and Urban Systems, 70 (November
2017), 151162. https://doi.org/10.1016/j.compenvurbsys.2018.03.004 |
Problem: |
Currently,
over 80% of waste collected in New York city is disposed of in landfills at a
cost of nearly $400 million dollars a year. To reduce the cost and
environmental impact of landfilling, a high resolution and targeted municipal
solid waste management policy is needed. |
Technical Solution: |
Approach: Develop a predictive
model for waste generation at the building-level in the dense urban environment
of New York City. The analysis combines a socio-spatial model of waste
generation per capita per week with estimates of the occupant population for
each of the more than 750,000 residential buildings in the City. Methodology: STEP
1: Weekly waste generation predictive model (31 features) Develop
a predictive model by comparing the performance of gradient boosting
regression tree (GBRT) and Neural Network (NN) machine learning algorithms to
estimate weekly waste generation for each of the 609 DSNY sub-sections. STEP
2: Individual building population estimation Estimate
individual building populations for all residential properties in NYC by
implementing small area estimation methods that combine census population
data with specific building characteristics including type, size, and density. STEP
3: Weekly waste generation at the building level Multiply
predicted per capita weekly waste generation for each DSNY sub-section (step
1) with the estimated building population of a given building located within
that sub-section (step 2) to calculate weekly generation at the
building-level. STEP
4: Results validation through sub-section and individual collection truck
waste data. |
Datasets Used: |
Data set 1: Waste
collection tonnage data (232 sections and 609 sub-sections), 2013-2016. From Department
of Sanitation Data set 2: Urban form
(mapPLUTO), 2016. From Department of City Planning. Data set 3: American
Community Survey (ACS), 2015. From Department of Commerce Data set 4: DOE school enrollment
data, 2016. From Department of Education Data set 5: Local Law 84
data, 2016. From Mayors office of Sustainability Data set 6: Weather data,
2013-2016. From Weather Underground Data set 7: Holiday data. From
Python package |
Outcome: |
GBRT and NN comparison: The GBRT model
yields 0.87 R-squared and 0.034 RMSE, and the NN model yields 0.77 R-squared
and 0.050 RMSE. => GBRT selected
for model development Predictive Model
Performance and feature importance: -
All 4 models (total waste, refuse, MGP, paper) perform well, with high
R-squared values (mean of 0.81) and a high proportion of samples with less
than 20% mean absolute error (MAE). The reduced performance of the recycling
models reflects the significant spatial variations in recycling behavior. -
Highest feature importance are weather, residential building type and
density, and demographic variables. Validation: -Projected total waste and refuse
generation within 10% of the actual waste generation in 83% of the
sub-sections -In the two collection truck
validation cases, the model resulted in 99.8% and 93.9% prediction accuracy. Policy
Implications: -DSNY
can develop more efficient routing schedules for its collection trucks
reducing the total vehicle- and person-hours for collection. -The
ability to detect areas that have higher or lower tendencies to recycle
permits targeted outreach programs. |
Issues that arose: |
Issue 1: Individual
geographies have different collection schedules (bi-weekly or tri-weekly) so
the daily collection data was aggregated weekly to compare different DSNY
sub-sections across the city based on a common temporal scale. Issue 2: Route
validations conducted for only 2 representative DSNY subsections because data
are limited and received in text (PDF) form. Individual truck routes, and the
buildings adjacent to those routes, had to be digitized using an algorithm
developed to extract text files and spatially join each respective street
segment and route location. |
Status: |
Model is set to be
improved as the DSNY is providing accurate information on building
occupancies at high temporal resolution and specific information on the waste
set-out (pick-up/drop-off) point for each building. |
Entered by: |
Marc Saleh 1001300442 |