City:	New York City, New York, United States
Organization:	· Center for Urban Science and Progress, New York University, United States · New York City Department of Sanitation, United States
Project Start Date:	2017
Project End Date:	November 2017
Reference:	Kontokosta, C. E., Hong, B., Johnson, N. E., & Starobin, D. (2018). Computers, Environment and Urban Systems Using machine learning and small area estimation to predict building-level municipal solid waste generation in cities. Computers, Environment and Urban Systems, 70 (November 2017), 151–162. https://doi.org/10.1016/j.compenvurbsys.2018.03.004
Problem:	Currently, over 80% of waste collected in New York city is disposed of in landfills at a cost of nearly $400 million dollars a year. To reduce the cost and environmental impact of landfilling, a high resolution and targeted municipal solid waste management policy is needed.
Technical Solution:	Approach: Develop a predictive model for waste generation at the building-level in the dense urban environment of New York City. The analysis combines a socio-spatial model of waste generation per capita per week with estimates of the occupant population for each of the more than 750,000 residential buildings in the City. Methodology: STEP 1: Weekly waste generation predictive model (31 features) Develop a predictive model by comparing the performance of gradient boosting regression tree (GBRT) and Neural Network (NN) machine learning algorithms to estimate weekly waste generation for each of the 609 DSNY sub-sections. STEP 2: Individual building population estimation Estimate individual building populations for all residential properties in NYC by implementing small area estimation methods that combine census population data with specific building characteristics including type, size, and density. STEP 3: Weekly waste generation at the building level Multiply predicted per capita weekly waste generation for each DSNY sub-section (step 1) with the estimated building population of a given building located within that sub-section (step 2) to calculate weekly generation at the building-level. STEP 4: Results validation through sub-section and individual collection truck waste data.
Datasets Used:	Data set 1: Waste collection tonnage data (232 sections and 609 sub-sections), 2013-2016. From Department of Sanitation Data set 2: Urban form (mapPLUTO), 2016. From Department of City Planning. Data set 3: American Community Survey (ACS), 2015. From Department of Commerce Data set 4: DOE school enrollment data, 2016. From Department of Education Data set 5: Local Law 84 data, 2016. From Mayor’s office of Sustainability Data set 6: Weather data, 2013-2016. From Weather Underground Data set 7: Holiday data. From Python package
Outcome:	GBRT and NN comparison: The GBRT model yields 0.87 R-squared and 0.034 RMSE, and the NN model yields 0.77 R-squared and 0.050 RMSE. => GBRT selected for model development Predictive Model Performance and feature importance: - All 4 models (total waste, refuse, MGP, paper) perform well, with high R-squared values (mean of 0.81) and a high proportion of samples with less than 20% mean absolute error (MAE). The reduced performance of the recycling models reflects the significant spatial variations in recycling behavior. - Highest feature importance are weather, residential building type and density, and demographic variables. Validation: -Projected total waste and refuse generation within 10% of the actual waste generation in 83% of the sub-sections -In the two collection truck validation cases, the model resulted in 99.8% and 93.9% prediction accuracy. Policy Implications: -DSNY can develop more efficient routing schedules for its collection trucks reducing the total vehicle- and person-hours for collection. -The ability to detect areas that have higher or lower tendencies to recycle permits targeted outreach programs.
Issues that arose:	Issue 1: Individual geographies have different collection schedules (bi-weekly or tri-weekly) so the daily collection data was aggregated weekly to compare different DSNY sub-sections across the city based on a common temporal scale. Issue 2: Route validations conducted for only 2 representative DSNY subsections because data are limited and received in text (PDF) form. Individual truck routes, and the buildings adjacent to those routes, had to be digitized using an algorithm developed to extract text files and spatially join each respective street segment and route location.
Status:	Model is set to be improved as the DSNY is providing accurate information on building occupancies at high temporal resolution and specific information on the waste set-out (pick-up/drop-off) point for each building.
Entered by:	Marc Saleh 1001300442 Marc.saleh@mail.utoronto.ca