City

Many, including London, Boston, Tokyo, Boston, San Francisco,…

Organization

Roberto Catini, Dmytro Karamshuk, Orion Penner, and Massimo Riccaboni

Project Start Date

Unknown

Project End Date

14 May 2015 (published)

Reference

Catini, R., Karamshuk, D., Penner, O., & Riccaboni, M. (2015). Identifying geographic clusters: A network analytic approach. Research Policy 44, 1749–1762.

Problem

Activity clusters in the city have been studied on an aggregated, static level (e.g. MSA, TAZ). The authors propose a method of identifying functional clusters and their evolution over time, as “…clusters change their location, size and performances over time and tend not to fit static administrative boundaries.”p1.

Technical Solution

·       Geocoding locations with the help of Yahoo! Geocode API

·       Identify unique institution names from test string using method from Jonnalagadda and Topham (2010)

·       Forming clusters

o   take an arbitrary location and assign it to a new cluster;

o   find all locations closer than distance l to that location and assign them to the same cluster;

o   recursively add locations closer than l to at least one location already in the cluster until there are no new locations within distance l of any added location

o   Repeat steps 1 to 3 for any location unassigned to a cluster

l chosen as 1km to represent approximate walking distance

·       To find cluster cores, k-shell decomposition was performed with the k-index being the number of citations (rather than the traditional number of links). For analysis and city comparisons, a k-index of 1/10th of the max k-index was chosen for each city.

·       To estimate the average quality of research within a core the authors average the Impact Factors of each publication in that core. A publication’s IF is the number of citations it has.

Datasets Used

23 million publications in the biomedical sciences from National Library of Medicine’s PubMed database, filtered down to just over 6 million.

Outcome

·       A map of the results can be found here: https://docs.google.com/file/d/0B5YBN19D4CsGdE50bGpGY0JaVGM/edit?pli=1

·       The amount of overlap between the clusters generated and those implied by the existing U.S. MSAs ranged from 42% for Los Angeles–Long Beach–Anaheim to 91% for Boston–Cambridge–Newton.

·       After k-shell decomposition, some cities are found to be relatively monocore and others are multicore – e.g. Toronto and New York City, respectively –, and some cores more impactful and/or productive than others

·       The fact that some clusters cross jurisdictional boundaries has policy implications for regions attempting to foster growth and innovation

·       Clusters are changes over time

Issues that arose

·       Name extraction and Text disambiguation: Getting unique institution names from publication records is difficult because of it is included in a single field along with other  attributes like department name, address, etc.. Further, acronyms, word omissions, typos, and alternative orderings of words make identifying unique institution names difficult. Manual validation indicated that 92% of institution names were extracted correctly, and 72% of extracted institution names were disambiguated correctly.

·       Less than half the available data were used

Status

·       Technical improvements can be made to geocoding and disambiguation methods

·       Instead of applying a uniform distance threshold for a node’s addition to a cluster, the addition of nodes to a cluster can be a determined by evidence for real local relationships, e.g. actual co-authorships.

·       There is room to conduct further empirical studies of cluster dynamics and locational advantages, as well as, to extend the presented framework to other types of spatial activities and relationships

·       Paper cited 15 times as of 25 September 2018 according to Google Scholar

Entered by

Khalil J. Martin