Traffic Prediction in a Bike Sharing System

City:	New York City, New York, USA Washington, DC, USA
Organization:	Microsoft Corporation
Project Start Date:	Unknown
Project End Date:	November 3, 2015
Reference:	Li, Y., Zheng, Y., Zhang, H., Chen, L., (2015), Traffic Prediction in a Bike Sharing System, Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Article No. 33.
Problem:	The rents/returns of bike share stations are unbalanced leading to bikes being unavailable at certain stations and too many bikes at other stations. Available models typically focus narrowly on one variable to predict traffic patterns within bike shares where as in reality many variables affect the traffic patterns within the bike network. This leads to inaccurate predictions and inefficiencies in within the bike network.
Technical Solution:	The team first breaks the stations into clusters using a bipartite clustering algorithm. Next the model predicts the total traffic within the bike share system itself using a gradient based regression tree method based on historical trip data and meteorological data. Finally, the model allocates the total predicted traffic to and from each cluster using a multi-similarity-based inference model with three variables: time, weather, and temperature/wind.
Datasets Used:	DC Bike Data, https://www.microsoft.com/en-us/research/publication/traffic-prediction-in-a-bike-sharing-system/, April 1 - September 30, 2014 DC Meteorological Data, https://www.microsoft.com/en-us/research/publication/traffic-prediction-in-a-bike-sharing-system/, April 1 - September 30, 2014 NY Bike Data, https://www.microsoft.com/en-us/research/publication/traffic-prediction-in-a-bike-sharing-system/, April 1 - September 30, 2014 NY Meterological Data, https://www.microsoft.com/en-us/research/publication/traffic-prediction-in-a-bike-sharing-system/, April 1 - September 30, 2014
Outcome:	A 0.03 reduction in ER for all hours and 0.18/0.23 reduction in ER for anomalous hours. ER is a mathematical measure of the amount of error a model produces using a loss function that measures the error based on the comparison of the model to ground measurements. In this case the data from April 1 to September 10 was used as training data and the remaining data till September 30 was used to test the model for prediction accuracy.
Issues that arose:	Different models were applied under different scenarios according to their performance within those scenarios. The hierarchical-multi-similarity inference method was used for common hours, and the check-in method using bipartite-clustering, inter-cluster transition, and trip duration was used for anomalous hours. Anomalous hours were detected manually in the data but in the future the team plans to determine anomalous data as data that is certain number of standard deviations from the predicted value. Currently the team has no technical way of determining the number of clusters to use since too many clusters leads to unpredictability and large error margins, where as too few clusters leads to no value to the bike share user since they may have to walk unreasonable distances within the cluster itself. The model does not predict the number of bikes at each station within each cluster which can lead no bikes at a station or a full station within the cluster
Status:	Unknown
Entered by:	Brandon Bradt, brandon.bradt@mail.utoronto.ca

CEM1002,

Civil Engineering, University of Toronto

Contact: msf@eil.utoronto.ca