Data Analytics for Smart Parking Applications

City:

Barcelona, Spain

Padua, Italy

Organization:

Centre Tecnologic de Telecommunicacions de Catalunya (CTTC)

Department of Information Engineering (DEI), University of Padova

Internet Interdisciplinary Institute (IN3), Universitat Oberta de Catalunya (UOC)

Project Start Date:

December 2014 (Start of Data Collection)

Project End Date:

September 23, 2016 (Paper Published)

Reference:

Problem Definition:

Problem: Sensors are used to detect whether a parking spot is occupied or not, but this data is unsorted and sensors may be providing incorrect data.

Objective: Using data analytics that not only sorts parking spaces based on their average use, but also detects outliers in the data that indicate faulty or broken parking sensors.

Technical Solution:

Four different methods were used to classify the data and to detect outliers:

  • K-means: a popular, supervised clustering technique.
  • EM (Expectation-Maximization): an unsupervised clustering technique.
  • DBSCAN (Density-based Spatial Clustering): an unsupervised clustering technique.
  • SOM (Self-Organizing Maps): an unsupervised clustering technique, newly created for this application.

Datasets Used:

  • Data Set 1: Synthetic Data - Classification Performance with Varying Number of Clusters: artificially created dataset with predefined number of clusters.
  • Data Set 2: Synthetic Data – Classification Performance with Outliers and Complex Statistics: artificially created dataset designed to closely resemble statistics of real parking events.
  • Data Set 3: Classification Performance on Real Data: a dataset from the WSN parking sensors deployment by Worldsensing. Data was collected from 370 wireless sensors over a 6-month period from December 1, 2014 to May 30, 2015. Sensors could detect occupancy of a vehicle in their respective space and recorded this data along with a time stamp accordingly.

Outcome:

It was found that the SOM unsupervised clustering technique proved the most effective at clustering the data and detecting outliers. A separate cluster was created that only contained outliers in the data that directly matched the manual collection of outlier data points. EM and DBSCAN proved to be acceptable at clustering, but not capable of separating out outliers. The k-means method proved the least effective at clustering and was also not able to detect outliers. This held true for all three datasets.

Issues that arose:

There were no issues with the datasets used, although issues could occur with future datasets:

  • As the number of clusters increase, correctly classifying the data becomes more difficult. This could prove to be an obstacle in larger datasets with bigger ranges in the data.
  • For significantly bigger datasets, the algorithms would need to be adjusted as they are computationally time consuming and costly.

Status:

After researching online, no work has been done on this topic or with the developed algorithm since the paper was published.

Entered by:

Jacob Malleau 999761707

jacob.malleau@mail.utoronto.ca

September 28, 2018



CEM1002,

Civil Engineering, University of Toronto

Contact: msf@eil.utoronto.ca