Predicting Cycling Accident Risk in Brussels: A Spatial Case-Control Approach

City:	Brussels, Brussels-Capital Region, Belgium
Organization:	Center for Operations Research and Econometrics, Université catholique de Louvain
Project Start Date:	2012 (Inferred)
Project End Date:	28 June 2013
Reference:	Vandenbulcke, G., Thomas, I., Int Panis, L. (2014). "Predicting cycling accident risk in Brussels: a spatial case-control approach", In Accident Analysis and Prevention, Vol. 62, pp. 341-357.
Problem:	Despite being the most highly urbanised part of Belgium, cycling mode share in the Brussels-Capital Region (BCR) is estimated to be less than 4%. Although there is increased interest in cycling rates, high cycling accident rates deter most residents from utility cycling. Currently, 62% of car trips are less than 5 km, representing a significant opportunity for cycling if accident rates can be reduced. While it is known that infrastructure design affects accident rates, the impact of specific cycling facilities and road conditions on cycling accident risk is poorly studied. Thus, most attempts to reduce accidents focus on black spot analysis, i.e. looking at where accidents have been concentrated, rather than taking a network-level approach to predicting where accidents may occur.
Technical Solution:	Geospatial Case-Control Method: An epidemiology-inspired, geospatial case-control method was used to develop a binary dependent variable. For each location of an accident or Case point in GIS, the dependent variable or target was assigned a value of 1. Control points with target equal to 0 were created where no accidents have been reported. The density of control points was determined used a novel Potential Bicycle Traffic Index (PBTI). The PBTI, calculated for each sub-municipality is a proxy for cycling demand, based on a gravity trip generation model that incorporates mode share, population density, kilometres of cycling network, and distance. Controls were prevented from being placed at potential black spots with a Network Kernel Density Estimation using SANET v4 in ArcGIS. Data Preparation: Infrastructure and traffic condition risk factors were assigned to each case or control point geocoded in GIS. Risk factors including road characteristics, type of cycle facility, presence of tram track or traffic calming, and distance to parking or garages were then extracted. In addition, a road Complexity Index was constructed as a proxy for road legibility, incorporating the number of intersecting streets and amount of marking or signage. Two-Stage Logit Model: A hierarchical, two-stage frequentist then Bayesian modelling approach was employed in order to express the relative weights of each studied attribute as a probability of causing cycling accidents. Initial values were obtained from a frequentist logit model run in SAS Enterprise Guide 4.2 and R 2.12.1, in order to identify the most significant risk factors. After several goodness-of-fit and inferential statistical tests (Likelihood ratio, Wald Test, Wald chi-square statistic, Log Likelihood, Akaike's Information Criterion, Hosmer-Lemeshow, and Le Cessie-Houwelingen) and multicollinearity diagnostics (Variance Inflation Factors and condition indices), these initial values were then used in a Bayesian model computed with R2WinBUGS to improve convergence of the Markov Chain. Finally, two goodness-of-fit measures were applied, namely the Mean Absolute Predictive Error and the Deviance Information Criterion (DIC), to examine the trade-off between model fit and complexity. Additional Model Improvement: Join-count test statistics for spatial dependence and Moran's I tests informed the choice of risk factors. Autoregressive and autologistic models were tested to improve results, accounting for temporally- and spatially-autocorrelated data, respectively, in the dependent variable. In addition, heteroskedasticity was corrected using the Huber-White method.
Datasets Used:	Dataset 1: UrbIS GeoSpatial Databases, Brussels Regional Informatics Center (BRIC), 2012 Dataset 2: Cyclist Accidents, Directorate-General for Statistics and Economic Information (DGSEI), 2006-2008 Dataset 3: Socio-Economic Census, Directorate-General for Statistics and Economic Information (DGSEI), 2001 Dataset 4: Motorised Vehicle Traffic Volumes, Brussels' Institute for Environmental Management IBGE-BIM, 2006
Outcome:	General Outcomes: Despite goodness-of-fit and inferential statistical tests indicating a successful model after the frequentist approach, the autologistic Bayesian model was found to be the most robust, showing convergence and the lowest DIC. This indicates that case data are spatially autocorrelated, which limits many other methods currently in use. The model identified several, previously unknown, high-risk factors for cycling accidents as well as factors that decrease risk. The Complexity Index explained 30% of risk. Among other factors, bridges without cycle facilities, intersections and roundabouts with separated cycle facilities, and shopping centres caused significant increased risk. Interestingly, public transport stops did not affect risk, and contraflow cycling lanes decreased risk. Prediction in Areas with No Reporting: The model developed allows for the prediction of cycling accident risk in areas where no or few cyclist accidents are currently reported. Typically, only 15% of cycling accidents are reported in Brussels, with a much lower likelihood of reporting in certain locations, for low severity accidents, and for single-bicycle accidents. The model indicates that there is indeed high risk in some areas without any reports. Mapping Predicted Risk: Using the developed model and SANET v4 in ArcGIS, points were sampled every 10m along the road network and interpolated along the line using a spline curve, in order to predict cycling accident risk across the entire region. This is the first network-level model that allows one or more risk factors to be mapped and analyzed by planners and policymakers. Studying Effects over Time: In addition, having the year associated with case points allows the effect of infrastructure changes over a number of years to be easily studied, which black spot analysis does not.
Issues that arose:	Most of the issues that arose have to do with data preparation or assumptions. PBTI Gravity Model: There was no viable method of empirically validating the PBTI model, on which all Control points are based, potentially introducing significant bias. Study Area: A 35-km buffer around the BRC was added to reduce the impact of edge cases. Parking Facilities and Garage Entrances: A time-consuming digitisation of all discontinuities of cycle facilities around parking and garage entrances had to be undertaken, with a new risk factor created to represent crossing these areas. Distances to these facilities were then assigned to the closest case or control point. This process limits the ability of reproduction in other jurisdictions. Other Data Completeness Issues: Several other data completeness issues arose. For example, accidents near tram tracks did not indicate if the cyclist was riding parallel or perpendicular to the tracks. In addition, no minor road data was available for traffic volume, so all volume on these roads was assumed to be low (1 out of 5), which may introduce error.
Status:	Terminated
Entered by:	October 30, 2020: Adam Hasham, adam.hasham@mail.utoronto.ca

CEM1002,
Civil Engineering, University of Toronto
Contact: msf@eil.utoronto.ca