Urban Crime Prediction

City:

London, UK

Organization:

Telefoonica Digital, The Open Data Institute and Massachusetts Institute of Technology

Project Start Date:

September 2013

Project End Date:

September 2014 

Reference:

Bogomolov, A., Lepri, B., Staiano, J., Oliver, N., Pianesi, F., & Pentland, A. (2014, November). Once upon a crime: towards crime prediction from demographics and mobile data. In Proceedings of the 16th international conference on multimodal interaction (pp. 427-434). ACM.

Problem:

Crime sometimes cluster in small geographic areas that does not align with the overall socio-economic characteristics of a neighbourhood. For example, crime may cluster on a street in a “good” neighbourhood. This study uses place-centric and data-driven approach to determine whether aggregated mobile network activity is a good predictor of potential crime hotspots. Findings of the study can be used to inform the police and the city of where they should invest in and on how they can have a quicker response time.


Technical Solution:

Supervised learning:

Training data — 80%

Test data — 20%

Methods used to select the variables:

  • Pearson correlation analysis
  • Principal component analysis
  • Feature ranking
  • Feature subset selection

Prediction of crime hotspots use binary classification to determine if an area will be a hotspot or not. 

Two models, Borough Profile Model (BPM) and Smartstep Model, were tested using each mode:

  • Logistic regression
  • Support vector machine
  • Neural networks
  • Decision trees
  • Random forest –- yield the best result of 70% accuracy using Smartstep data


Datasets Used:

  • Dataset 1: Criminal case dataset, The Open Data Institute, December 2012 and January 2013
  • Dataset 2: Smartsteps dataset, Telefoonia Digital, December 9th to 15th, 2012 and December 23rd, 2012 to January 5th, 2013 
    • Mobile network activities
  • Dataset 3: Borough Profile dataset, The Open Data Institute, 2012
    • Neighbourhood characteristics


Outcome:

Smartsteps model have higher predictive power because the accuracy of Smartsteps model is 6% higher than BPM (the typical model)

Performance metrics shows the Smartsteps model using the random forest method have an 70% accuracy in predicting whether a cell will be a crime hotspot

In conclusion, the model using the human behavioural data driven from mobile network activities in combination with demographics data have a strong predictive power in predicting crime hotspots


Issues that arose:

  • Borough Profile data is expansive and effort-consuming to collect and its updated less frequently
  • Only 3 weeks of Smartsteps data were available
  • Crime events were aggregated on a monthly level because no specific date was attached to each crime event in the dataset
    • Finer aggregation at weekly, daily or hourly level is preferred


Status:

Terminated

Entered by:

November 13, 2017. Christina Zhang, cshuang.zhang@mail.utoronto.ca



CEM1002, 

Civil Engineering, University of Toronto 

Contact: msf@eil.utoronto.ca