Mining Smart Card Data for Transit Riders’ Travel Patterns

City:	Beijing, China
Organizations:	Department of Civil and Environmental Engineering- University of Washington, Seattle Department of Civil Engineering and Engineering Mechanics, University of Arizona, Tucson Beijing Transportation Research Center
Project Start Date:	5 July 2010 (start of data collection period)
Project End Date:	18 July 2013 (article accepted by Transportation Research Part C: Emerging Technologies)
Reference:	Ma, X., Wu, Y., Wang, Y., Chen, F. and Liu, J. (2013). Mining Smart Card Data for Transit Riders' Travel Patterns. Transportation Research Part C: Emerging Technologies, 36, pp.1-12.
Problem:	Local transit agencies in Beijing want to improve transit services and develop marketing strategies to attract new ridership or maintain existing ridership. Traditional evidence based transit planning has been largely based on household surveys, interviews or travel diaries, which are costly and typically only account for singular day travel transit user travel patterns.
Technical Solution:	The authors propose a series of data-mining procedures using multi-day smart card data to analyze an individual transit user's historic travel pattern and determine their regularity or "frequency of the similar trips" (X. Ma et al. 2013): 1. Missing disembarkation information (e.g. riders exiting a flat fare bus are not required to use their smart card) in an individual transit user's series of daily transit trips is first approximated using a Markov chain based Bayesian decision tree algorithm and GPS Data. 2. Individual transit users' series of daily transit trips are linked to generate an individuals' trip chain using fixed average transit transfer and ride time from a 2010 Beijing Transport Survey, 25.4 and 40 minutes, respectively (X. Ma et al. 2013). 3. The density based spatial clustering algorithm (DBSCAN) is used to discover or approximate individual users transit pattern history once individual trip chains estimated; unusual travel patterns in an individuals' week are flagged as noise, and different routes an individual took to the same location are clustered with other shared stops. 4. A clustering algorithm (K-Means++) to group transit users based on their level of transit ridership pattern regularity is applied. Data is first standardized by dividing by each attributes range, to obtain values between 0 to 1. Data is then cleaned to rectify incorrectly recorded transaction times. 5. The authors suggest the K-Means++ algorithm may not be suitable for larger data sets on its own. The decision results of the K-Means++ algorithm are instead used as training data for the rough set theory algorithm. 6. The accuracy and speed of the rough set algorithm is compared with other classification algorithms: K-Nearest Neighbour, Decision Tree, Naive Bayes Classifier and Three-hidden-layers Neural Network. The rough set theory algorithm proved most efficient and accurate, and therefore is concluded best for larger smart card data sets by the authors.
Datasets Used:	Transaction data from 3,845,444 smart cards from transit users in Beijing between Monday, July 5 to Friday, July 9, 2010, and Historical Speed Profiles retrieved from GPS Data (X. Ma et al. 2013). All data was provided by the Beijing Transportation Research Center.
Outcome:	The authors have proposed an efficient data mining approach to process large amounts of smart card transit data and therefore estimate individual transit user's trip chains and group their travel pattern regularity. Transit agencies and transportation planners can use the information generated from the authors' proposed data mining approach to effectively and efficiently target individual groups of transit users in marketing strategies or help in travel demand analysis and development.
Issues that arose:	Disembarkation from flat fare buses and transfer between different subway lines spatial and temporal information is not recorded in the smart card database. Some smart card records incorrectly recorded transaction time. Big volume of smart card data generated; at the time the article was written, the authors state: "in Beijing, more than 16 million smart card transaction data points are generated every day" (X. Ma et al. 2013).
Status:	Completed. Future recommended activities: compare proposed case study method to traditional transit surveys or travel diaries to improve accuracy of method, incorporate transit patterns determined using this case study with map-based transportation systems to visualize transit performance.
Entered by:	28-Sept-2018: Allison Bennett, Allison.Bennett@mail.utoronto.ca

CEM1002,
Civil Engineering, University of Toronto
Contact: msf@eil.utoronto.ca