Real Time Forecasting of Ontime Performance of Metrorail Systems

City:	Washington D.C., USA
Organization:	Washington Area Metropolitian Transit Authority(WAMTA) Virginia Tech
Project Start Date:	Winter 2017
Project End Date:	August 2018 (Published)
Reference:	Soper, R.R., Flowers, B., Bijinemula, S.K., Mayer, B., Khaghani, F., Holt, J.L., & Ramakrishnan, N. (2018). Real-time Forecasting of On-Time Performance in Metrorail Systems.
Problem:	WAMTA currently makes all real-time control decisions based on the assessment of dispatch on the current network state. Some alerts are based on thresholds, but in general the process is almost entirely manual. Additionally, the system lacks any level of predictability. This can create added delays due to error, force WAMTA to be retroactive not proactive, and decrease transparency while increasing rider frustration.
Technical Solution:	The research team used predictive modelling to add accuracy, transparency, and allow WAMTA dispatchers and operators to mitigate issues before they impact passengers. Machine learning was used to answer three main questions Can Delays be correlated over time and space? Can real-time passenger information be used as an indicator of delays that customers are facing on the network? Can the current state of the network be used to forecast delays that will impact riders who have not entered the network? The first question was answered using Pearson product movement correlation and k-spectral clustering. It was found that delays at a certain station tend to impact customers at nearby station the most. Using k-spectral clustering, the team suggested WAMTA provide detailed delay information to only the cluster impacted. This will allow riders who have not entered the system to avoid that cluster if possible. The second question was answered using the state vector, time, day, and Late Trip Percentage attributes. Using the model, the researchers found that 70% of the variance between actual and predicted late trips can be explained by the chosen predictors. The final question was answered using a sequence-to-sequence RNN, an autoencoder, and the k-nearest neighbour (kNN) algorithm. The sequences were kept short, predicting 15 and 30 minutes into the future based on the state at present and 15 and 30 minutes in the past. The model was tested using 13,004 sequences. Accuracy was determined by a 20% withheld test set during the training of the seq2seq RNN prediction. Accuracy for the 15-min forecast was 0.962 and 0.958 for the 30-min forecast. Next an autoencoder was set up and trained on the 30-minute forecast state. The kNN algorithm was used because the encoder was predicted to provide clustering associated with the resulting classification. The team used the median of 0.4, generated from the standardization of delays, to segregate minor delays due to normal operation from severe delays due to unpredicted events. The classifier was trained using a set of 80% of sequences/predictions and late-customer-percentage. The classifier was tested for accuracy on 20% of the test data. This classifier was found to have difficulty differentiating within minor delays due to the variability of normal Metro operation. Despite this challenge, WAMTA is more interested in severe dealys or those above 0.4. The team adopted the three nearest neighbours because it was the classifier with the lowest value near the inflection point of 0.4.
Datasets Used:	Dataset 1: Tap-In and Tap-Out Passenger Data, WAMTA, May, August, September, and October 2015 Dataset 2: WAMTA Travel Time Standards, WAMTA, 2015 Dataset 3: Metro Train Movements, WAMTA, March, May, August, and October 2015
Outcome:	The team created a solid base of algorithms, neural networks, encoders, and classifiers that have the ability to provide WAMTA and Metro users with knowledge at a higher accuracy than ever before.
Issues that arose:	Large, cumbersome data sets that had manual coding errors and corrupted files created dirty data. These limiting factors stopped the team from creating more than just a base model. The team recommends that additional data sets, maintenance records and incident reports, be used in future research to develop a more complete picture and increase accuracy.
Status:	Terminated
Entered by:	Nicholas Stern; nicholas.stern@utoronto.ca; September 28th, 2019

CEM1002,
Civil Engineering, University of Toronto
Contact: msf@eil.utoronto.ca