Combining disparate data sources for improved poverty prediction and mapping

 

City/Region:

Senegal

Organization:

PNAS: Proceedings of the National Academy of Sciences of the United States of America

Project Start Date:

Unknown

Project End Date:

Published November 2017

Reference:

Combining Data Sources for Poverty Mapping

Neeti Pokhriyal, Damien Christophe Jacques

Proceedings of the National Academy of Sciences Nov 2017, 114 (46) E9783-E9792; DOI: 10.1073/pnas.1700319114

Problem:

More than 330 million people are still living in extreme poverty in Africa. There is a lack of good-quality data to assess poverty regularly, in order to create policies in favour of economic development.

 

Governments and development agencies require a baseline depiction. Poverty maps are needed for efficient targeting of policies and to assess the impact of interventions.

 

Currently, the most reliable source is household surveys. However, this approach is time-consuming, expensive and only captures a small sample (versus the larger population), making timely updates of poverty challenging.

 

This paper attempts to use 2 data sources collected from communication devices (phones) and sensors (satellites, weather/ground sensors) to generate accurate poverty maps.

Technical Solution:

This paper attempts to accurately predict the Global Multidimensional Index (MPI, proxy for poverty), covering 552 communes in Senegal using environmental data (relating to food security, economic activity and accessibility to facilities) and call data records (capturing individualistic, spatial and temporal aspects of people).

 

Earth Observation Satellites have the ability to collect data on metrics such as vegetation cover, meteorological conditions and night-time lights. These datasets are cheaper to obtain, have global coverage and high revisit capability. Another resource this paper uses lies in Geographic Information Systems (GIS) analysis; related to proximity to important services and density of infrastructure. Satellite and GID data are useful to understand the availability of and access to resources (both natural and man-produced), but they lack information on micro and macro-behaviour of individuals/hosueholds, cultural backgrounds and socioeconomic features. To analyze this information, the study uses called call data records (CDRs).

Global Multidimensional Poverty Index (MPI); international comparable measure: our dependent variable; a composite of 10 indicators across 3 critical dimensionsÑeducation (years of schooling, school enrollment), health (malnutrition, child mortality), and standard of living conditions (cooking fuel, sanitation, access to drinking water, electricity, and floor and asset ownership)

To predict poverty for a commune, the paper uses 2 independently trained data sources; CDRs and environmental data.

 

A quantitative validation of the predictions generated from the framework (described above) is provided against commune-level poverty values estimated from (previously collected) census data. This is done using cross-validation procedures.

Datasets Used:

  • (Independent Variable) Called Call Data Records (CDRs): capture how, when, where and with whom individuals communicate.
    • Captures regularity, diversity and spatiotemporal variability in the userÕs mobile interaction.
    • The data belong to the subscribers of Sonatel, Orange, which is the dominant telecom provider in Senegal.
    • The data are anonymized and span a period from January 1 to December 31, 2013. They contain more than 9.54 million unique aliased mobile phone subscribers.
  • (Independent Variable) Environmental data: includes data related to food security, economic activity and access to services. They are either based on Geographical Information System (GIS), Earth Observation data, or weather stations.
    • Food security: mainly described by agro-meteorological measurements (temperature, precipitation, slope, elevation, soil type) that drive agricultural production (crop production), one of the most important inputs, along with livestock and fishing, of food availability in the country. On the other hand, access to staple food can be approximated by the average millet prices observed in the markets (retail prices in 56 local markets). Millet serves as the main local staple food crop in the country, making it a potentially good indicator of poverty. In addition, proximity to main road and urban centers was also computed to describe the connectivity to major markets
    • Economic activity: The economic activity corresponds to the intensity of urbanization. Among the studied features, the nighttime lights are the most frequently used to describe poverty using remote-sensing data
    • Access to services: the proximity to school, water towers, and hospitals can be used to determine the deprivation in education, water, and health, respectively
    • All environmental data are available at high spatial resolution, with the exception of crop production and millet prices.

á       (Dependent Variable) Global Multidimensional Poverty Index (MPI); international comparable measure: our dependent variable; a composite of 10 indicators across 3 critical dimensionsÑeducation (years of schooling, school enrollment), health (malnutrition, child mortality), and standard of living conditions (cooking fuel, sanitation, access to drinking water, electricity, and floor and asset ownership)

Outcome:

The model is statistically significant in estimating poverty (MPI). All deprivations (10 of the MPI indicators) are better predicted using CDR and environmental data.

 

Indicators related to education:

á      Use of short message service is indicative of literacy.

á      Environmental data captures distance to schools, main roads and urban centres, all of which facilitate access to educational attainment.

Indicators related to health:

á      CDR data does not capture the youth/children (So not significant)

 

Results:

á      Nighttime lights show a significant correlation with MPI; and urban areas and road density are two other important indicators of economic activity.

á      CDR data

o   Number of active days (for call and text) strong negative predictor of poverty. Individuals in wealthier communes have monetary resource to recharge their phones and make/receive calls.

o   The ratio of calls to texts; the high preference for calls important predictor for education-based deprivations.

o   Features that indicate diversity in communication report a negative relationship to poverty.

o   A delay in responding to text has a positive relationship to poverty.

o   Percent initiated calls has positive relationship to poverty (they are more likely to initiative calls for request of resources).

 

Overall, poverty map of Senegal produced by CDRs and environmental data is accurate (statistically significant), when compared to commune-level poverty levels.

Issues that arose:

A key issue related to using CDR data for population-level analyses is the selection bias arising from mobile phone ownership. Yet, in Senegal, there were 92.93 mobile phone subscriptions per 100 inhabitants (2013), implying that most of the population owns cell phones.

 

The second issue is the bias arising when using data from only one provider. The provider of the data used here is Sonatel. In 2013, Sonatel had nearly 62% of the cell phone market.

 

The third issue is that some demographic subgroups (children and ultra poor population) are left out by the analysis while only using CDR data.

Status:

In Development

Entered by:

28 September 2019, Asli Ersozoglu, asli.ersozoglu@mail.utoronto.ca



CEM1002,
Civil Engineering, University of Toronto
Contact: msf@eil.utoronto.ca