Modelling Road Network's Influence on Cyclist Behaviour

City: Cuenca, Province of Azuay, Ecuador
Organization: Universidad de Cuenca, Ecuador
Project Start Date: Unspecified (Most likely 2018 based on previous paper from Abad & Orellana)
Project End Date: Study published: August 19, 2019
Reference: Orellana, D., & Guerrero, M. L. (2019). Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data. Environment and Planning B: Urban Analytics and City Science, 46(7), 1314–1330.
Problem: There is a lack of understanding of how geometric and structural elements of a city's road network influences travel patterns of cyclists. Getting a better idea of this influence can facilitate urban planners in selecting the most effective locations to build and improve cycling infrastructure.

Most research on cycling has been limited to developed countries with financial resources to implement traffic monitoring programs. Due to financial constraints, studies in Latin America have been limited in terms of the scale of data collection. The use of crowdsourced data from fitness apps such as Strava presents an opportunity to scale data collection without major increases in cost.
Technical Solution: Variable Selection
  • Dependent Variables:
    • Total Cyclists
    • Cyclists on Weekdays
    • Commuting Activities on Weekdays
  • Independent (Control) Variables:
    • Socio-economic Conditions:
      • Household density
      • Living Conditions Index (ICV)
      • Land Use Mixture
    • Infrastructure:
      • Road Hierarchy (1=arterial road, 0=other)
      • Existence of Segregated Bike Lane (1=yes, 0=no)
      • Number of Intersections
    • Physical:
      • Slope of Street Segment
    • Spatial:
      • Normalized choice (NACH): Space Syntax measure of a segment's angular deviation compared to other segments within radius r (i.e. measure of directness).
      • Normalized integration (NAIN): Space Syntax measure of how close each segment is to others within radius r.
Data Preparation
  • Network Simplification
    • OSM and Strava networks were too complex on their own to calculate Space Syntax measures so authors had to take steps to simplify the network. This included removing multiple lines representing multi-lane roads, cleaning complex roundabouts, reconnecting intersections and simplifying the topography of the network.
  • Mapping of Variables to Simplified Network
    • Socio-economic variables were provided as a hexagonal mesh on the map of Cuenca. Road segments were assigned values based on the values of the hexagon where its centroid was found in.
    • Infrasture variables:
      • Road Hierarchy and Existence of Segregated Bike Lanes were available in OpenStreetMap.
      • Number of intersections were counted in each socio-economic hexagon.
    • Slope was calculated based on altitue of segment end points from the Digital Terrain Model.
    • Space Syntax Measures were calculated on the simplified network using the Space Syntax Toolkit in QGIS.
    • Data from Strava and OSM were mapped onto the simplified network using spatial join.
  • Statistical Analysis
    • Researchers modelled individual control variables' influence on the Dependent variables using Negative Binomial Models for linear regression. Most of the chosen variables were determined to have signficant impacts on cycling route choice except the Living Conditions Index (ICV) and Normalized Integration (NAIN).
      • ICV was therefore excluded from the Socio-economic dimension.
      • NAIN was excluded from the Spatial dimension.
      • For the Spatial dimension, researchers tested different radius values (r) to determine the value of r that had the most influence on the number of cyclists. NACH at that "global radius" was labelled NACH_Rn and was used for the Spatial dimension of the models moving forward.
Data Analysis
  • Multiple Regression Models: Models of increasing complexity were tested on each Dependent variable (total cyclists, cyclists on weekdays, and commuting activites on weekdays).
    • Intercept Model: No Independent variables
    • Base Model: Socio-economic, Infrastructure, and Physical
    • Full Model: Socio-economic, Infrastructure, Physical, and Spatial
Datasets Used:
  • Dataset 1: Geographic Road Network, OpenStreetMap (OSM), 2016
  • Dataset 2: Crowdsourced GPS Cycling Activity Data, Strava Metro, September 1, 2014 to September 30, 2015
  • Dataset 3: Household Density and Socio-economic Data, National Census Data (INEC), 2010
  • Dataset 4: Digital Terrain Model (3m resolution), MAGAP (Ministry of Agriculture and Livestock), 2012
  • Dataset 5: Land-Use Data, Municipality of Cuenca, (Year not provided)
Outcome: Model Performance
  • Base Model:
    • Total cyclists: pseudo R2=0.038
    • Cyclists on weekdays: pseudo R2=0.038
    • Commuting on weekdays: pseudo R2=0.029
  • Full Model:
    • Total cyclists: pseudo R2=0.055 (44.7% improvement over base model)
    • Cyclists on weekdays: pseudo R2=0.054 (42% improvement over base model)
    • Commuting on weekdays: pseudo R2=0.042 (44.8% improvement over base model)
Key Findings from Full Model
  • Introduction of a single spatial variable (NACH_Rn) representing network structure into the model improves its performance by about 44%
  • Most influential road network characteristics were hierarchy, dedicated cycling lanes, and directness (NACH_Rn).
  • Slope of segments had the lowest effect on cycling activity among the variables in the model.
Issues that arose:
  • Strava data set is biased towards users who own smart devices and could therefore skew results based on socio-economic status.
    • Authors of the study suggest that this has less of an effect in Cuenca since there is relatively low socio-spatial segregation in the city. The bias could be stronger in other cities.
    • Strava data also has a gender bias. The Strava data set had 12% female users compared to 21% according to Ecuador's National Cyclist Profile. It is worth noting that 13% of Strava users did not report their gender compared to 1% in the National Cyclist Profile.
  • Study has been limited to the city of Cuenca, further research needs to be done in other regions to confirm that these results could apply in other municipalities in Latin America.
Status: Operational: Approach is replicable in any area where OSM and Strava Metro data are available. Next steps of research include studying other cities with different socio-economic profiles and to refine regression models with additional sources of data.
Entered by: 28-09-2019: Riccardo Caimano, riccardo.caimano@mail.utoronto.ca.


CEM1002,
Civil Engineering, University of Toronto
Contact: msf@eil.utoronto.ca