City |
Many, including London, Boston, Tokyo, Boston,
San Francisco,… |
Organization |
Roberto Catini, Dmytro Karamshuk,
Orion Penner, and Massimo Riccaboni |
Project Start Date |
Unknown |
Project End Date |
14 May 2015 (published) |
Reference |
Catini, R.,
Karamshuk, D., Penner, O., & Riccaboni, M. (2015). Identifying geographic
clusters: A network analytic approach. Research Policy 44, 1749–1762. |
Problem |
Activity clusters in the city have been studied
on an aggregated, static level (e.g. MSA, TAZ). The authors propose a method
of identifying functional clusters and their evolution over time, as “…clusters
change their location, size and performances over time and tend not to fit
static administrative boundaries.”p1. |
Technical Solution |
· Geocoding
locations with the help of Yahoo! Geocode API · Identify
unique institution names from test string using method from Jonnalagadda and Topham (2010) · Forming
clusters o take an
arbitrary location and assign it to a new cluster; o find all
locations closer than distance l to
that location and assign them to the same cluster; o recursively
add locations closer than l to at
least one location already in the cluster until there are no new locations
within distance l of any added
location o Repeat
steps 1 to 3 for any location unassigned to a cluster l chosen as 1km to represent approximate
walking distance · To find
cluster cores, k-shell decomposition was performed with the k-index being the number of citations (rather
than the traditional number of links). For analysis and city comparisons, a k-index of 1/10th of the
max k-index was chosen for each
city. · To
estimate the average quality of research within a core the authors average
the Impact Factors of each publication in that core. A publication’s IF is
the number of citations it has. |
Datasets Used |
23 million publications in the biomedical
sciences from National Library of Medicine’s PubMed database, filtered down
to just over 6 million. |
Outcome |
· A map
of the results can be found here: https://docs.google.com/file/d/0B5YBN19D4CsGdE50bGpGY0JaVGM/edit?pli=1 · The
amount of overlap between the clusters generated and those implied by the
existing U.S. MSAs ranged from 42% for Los Angeles–Long Beach–Anaheim to 91%
for Boston–Cambridge–Newton. · After
k-shell decomposition, some cities are found to be relatively monocore and others are multicore – e.g. Toronto and New
York City, respectively –, and some cores more impactful and/or productive
than others · The
fact that some clusters cross jurisdictional boundaries has policy
implications for regions attempting to foster growth and innovation · Clusters
are changes over time |
Issues that arose |
· Name extraction
and Text disambiguation: Getting unique institution names from publication
records is difficult because of it is included in a single field along with
other attributes like department name,
address, etc.. Further, acronyms, word omissions, typos, and alternative
orderings of words make identifying unique institution names difficult. Manual
validation indicated that 92% of institution names were extracted correctly,
and 72% of extracted institution names were disambiguated correctly. · Less
than half the available data were used |
Status |
· Technical
improvements can be made to geocoding and disambiguation methods · Instead
of applying a uniform distance threshold for a node’s addition to a cluster,
the addition of nodes to a cluster can be a determined by evidence for real
local relationships, e.g. actual co-authorships. · There
is room to conduct further empirical studies of cluster dynamics and
locational advantages, as well as, to extend the presented framework to other
types of spatial activities and relationships · Paper
cited 15 times as of 25 September 2018 according to Google Scholar |
Entered by |
Khalil J. Martin |