Welcome to the Database on Innovations in Migration Statistics - DIMiS
The database presents information on innovations in migration statistics, based on information collected by the UNECE Task Force on New Data Sources for Migration Statistics, related to activities by National Statistical Institutes, other organizations and researchers.
General information on the database, its purpose and the variables is available here.
SORTING THE TABLE: The table can be sorted by clicking on the header of each column.
FILTER: Clicking on the right end of the header of each column activates the filter function: it will be possible to select among various keywords for that column, or define a new keyword.
The database is also available in excel format: DIMiS.xlsx
|Beine M., L. Bertinelli, R. Cömertpay, A. Litina, J.-F. Maystadt||2021||A gravity analysis of refugee mobility using mobile phone data||Journal of Development Economics|
Volume 150, May 2021, 102618
|The objective of this study consists in analyzing the determinants of the internal mobility of refugees in Turkey. We track down this mobility relying on geolocalized mobile phone calls data and bring these measures to a micro-founded gravity model in order to estimate the main drivers of refugee mobility across 26 regions in 2017. Our results show that the movements of refugees are sensitive to income differentials and contribute therefore to a more efficient allocation of labor across space. Comparing these findings with those of individuals with a non-refugee status, we find that refugees are more sensitive to variations of income at origin and to distance, while less responsive to changes in income at destination. These findings are robust to the way mobility is inferred from phone data and to the choice of the geographical unit of investigation. Further, we provide evidence against some alternative explanations of mobility such as the propensity to leave refugee camps, transit through Turkey, social magnet effects and sensitivity to agricultural business cycles.||internal migration / population displacement||Turkey||MNO|
|Melachrinos C., M. Carammia, T. Wilkin||2020||Using big data to estimate migration “push factors” from Africa||Migration in West and North Africa and across the Mediterranean - Chapter 8, IOM Publications||This chapter explores the use of big data to estimate monthly country-level “push factors” of asylum-related migration. It also looks at whether estimates of push factors in countries of origin correlate with traditional data on irregular migration on the CentralMediterranean Route and asylum applications lodged in Italy. The frequency of negative and disruptive events in individual countries was aggregated into a composite Push Factor Index, which strongly correlates with applications for asylum in Europe in 2016 and 2017. However, following the effective closure of the Central Mediterranean Route in 2018 and 2019, this correlation was no longer apparent, showing that the explanatory power of the Push Factor Index is dependent on enabling factors.||international migration||EU Member States, Norway, Switzerland||Registers (irregular border crossing, asylum applications) / Others (Global Database on Events, Language and Tone)|
|Luca M., G. Barlacchi, N. Oliver, B. Lepri||2021||Leveraging Mobile Phone Data for Migration Flows||Forthcoming book chapter in "Data Science for Migration and Mobility Studies" edited by Dr. Emre Eren Korkmaz, Dr. Albert Ali Salah||Statistics on migration flows are often derived from census data, which suffer from intrinsic limitations, including costs and infrequent sampling. When censuses are used, there is typically a time gap - up to a few years - between the data collection process and the computation and publication of relevant statistics. This gap is a significant drawback for the analysis of a phenomenon that is continuously and rapidly changing. Alternative data sources, such as surveys and field observations, also suffer from reliability, costs, and scale limitations. The ubiquity of mobile phones enables an accurate and efficient collection of up-to-date data related to migration. Indeed, passively collected data by the mobile network infrastructure via aggregated, pseudonymized Call Detail Records (CDRs) is of great value to understand human migrations. Through the analysis of mobile phone data, we can shed light on the mobility patterns of migrants, detect spontaneous settlements and understand the daily habits, levels of integration, and human connections of such vulnerable social groups. This Chapter discusses the importance of leveraging mobile phone data as an alternative data source to gather precious and previously unavailable insights on various aspects of migration. Also, we highlight pending challenges that would need to be addressed before we can effectively benefit from the availability of mobile phone data to help make better decisions that would ultimately improve millions of people's lives.||international migration||not applicable||MNO|
|Fiorio L., E. Zagheni, G. Abel, J., Hill, G. Pestre, E. Letouzé, J. Cai||2021||Analyzing the effect of time in migration measurement using geo-referenced digital trace data||Demography, 1–24 (2021)||Georeferenced digital trace data offer unprecedented flexibility in migration estimation. Because of their high temporal granularity, many migration estimates can be generated from the same data set by changing the definition parameters. Yet despite the growing application of digital trace data to migration research, strategies for taking advantage of their temporal granularity remain largely underdeveloped. In this paper, we provide a general framework for converting digital trace data into estimates of migration transitions and for systematically analyzing their variation along a quasi-continuous time scale, analogous to a survival function. From migration theory, we develop two simple hypotheses regarding how we expect our estimated migration transition functions to behave. We then test our hypotheses on simulated data and empirical data from three platforms in two internal migration contexts: geotagged Tweets and Gowalla check-ins in the United States, and cell-phone call detail records in Senegal. Our results demonstrate the need for evaluating the internal consistency of migration estimates derived from digital trace data before using them in substantive research. At the same time, however, common patterns across our three empirical data sets point to an emergent research agenda using digital trace data to study the specific functional relationship between estimates of migration and time and how this relationship varies by geography and population characteristics.||internal migration||Senegal, USA||MNO / Social media|
|Rampazzo F., J. Bijak, A. Vitali, I. Weber, E. Zagheni||2021||A framework for estimating migrant stocks using digital traces and survey data: an application in the United Kingdom||Demography (2021) 58 (6): 2193–2218||An accurate estimation of international migration is hampered by a lack of timely and comprehensive data, with different definitions and measures of migration adopted by different countries. Thus, we complement traditional data sources for the United Kingdom with social media data. Our aim is to understand whether information from digital traces can help measure international migration. The Bayesian framework proposed in the Integrated Model of European Migration is used to combine data from the Labour Force Survey (LFS) and the Facebook Advertising Platform in order to study the number of European migrants in the UK, aiming to produce more accurate estimates of European migrants. The overarching model is divided into a Theory-Based Model of migration, and a Measurement Error Model. We review the quality of the LFS and Facebook data, paying particular attention to the biases of these sources. The results indicate visible yet uncertain differences between model estimates using the Bayesian framework and individual sources. Sensitivity analysis techniques are used to evaluate the quality of the model. The advantages and limitations of this approach, which can be applied in other contexts, are also discussed. We cannot necessarily trust any individual source, but combining them through modelling offers valuable insights.||international migration||the United Kingdom||Social media (Facebook)|
|Sîrbu A., G. Andrienko, N. Andrienko, C. Boldrini, M. Conti, F. Giannotti, R. Guidotti, S. Bertoli, J. Kim, C.I. Muntean, L. Pappalardo, A. Passarella, D. Pedreschi, L. Pollacci, F. Pratesi, R. Sharma||2021||Human migration: the big data perspective||International Journal of Data Science and Analytics volume 11, pages341–360 (2021)||How can big data help to understand the migration phenomenon? In this paper, we try to answer this question through an analysis of various phases of migration, comparing traditional and novel data sources and models at each phase. We concentrate on three phases of migration, at each phase describing the state of the art and recent developments and ideas. The first phase includes the journey, and we study migration flows and stocks, providing examples where big data can have an impact. The second phase discusses the stay, i.e. migrant integration in the destination country. We explore various data sets and models that can be used to quantify and understand migrant integration, with the final aim of providing the basis for the construction of a novel multi-level integration index. The last phase is related to the effects of migration on the source countries and the return of migrants.||international migration / internal migration||not applicable||Others (Big Data in general)|
|Chi G., F. Lin, G. Chi, J. Blumenstock||2020||A general approach to detecting migration events in digital trace data||PLoS ONE 15(10): e0239408||Empirical research on migration has historically been fraught with measurement challenges. Recently, the increasing ubiquity of digital trace data—from mobile phones, social media, and related sources of ‘big data’—has created new opportunities for the quantitative analysis of migration. However, most existing work relies on relatively ad hoc methods for inferring migration. Here, we develop and validate a novel and general approach to detecting migration events in trace data. We benchmark this method using two different trace datasets: four years of mobile phone metadata from a single country’s monopoly operator, and three years of geo-tagged Twitter data. The novel measures more accurately reflect human understanding and evaluation of migration events, and further provide more granular insight into migration spells and types than what are captured in standard survey instruments.||international migration||not applicable||MNO / Social media|
|Alexander M., K. Polimis, E. Zagheni||2020||Combining social media and survey data to nowcast migrant stocks in the United States||Popul Res Policy Rev (2020). https://doi.org/10.1007/s11113-020-09599-3||Measuring and forecasting migration patterns has important implications for understanding broader population trends, for designing policy efectively and for allocating resources. However, data on migration and mobility are often lacking, and those that do exist are not available in a timely manner. Social media data ofer new opportunities to provide more up-to-date demographic estimates and to complement more traditional data sources. Facebook’s Advertising Platform, for example, is a potentially rich data source of demographic information that is regularly updated. However, Facebook’s users are not representative of the underlying population. This paper proposes a statistical framework to combine social media data with traditional survey data to produce timely ‘nowcasts’ of migrant stocks by state in the United States. The model incorporates bias adjustment of Facebook data, and a pooled principal component time series approach, to account for correlations across age, time and space. We use the model to estimate and project migrants from Mexico, India and Germany, three migrant groups with varying levels and trends of migration in the US. By comparing short-term projections with data from the American Community Survey, we show that the model predictions outperform alternatives that rely solely on either social media or survey data.||international migration||USA||Social media|
|Hsiao Y., L. Fiorio, J. Wakefield, E. Zagheni||2020||Modeling the bias of digital data: an approach to combining digital and survey data to estimate and predict migration trends||MPIDR Working Paper WP-2020-019, 28 pages.||Reliable and timely estimates of migration flows are needed to guide our policy decisions and to improve our understanding of migration processes. However, obtaining timely and fine-grained estimates remains an elusive goal. Digital data provide granular information on time and space based on large sample sizes, but because these samples are often not representative of the general population, the estimates obtained by analyzing these data are biased. We propose a generic method for combining digital and survey data for the purposes of migration estimation by accounting for the bias structure of digital data. Specifically, we show that if the bias has a structure over time and space that can be statistically modeled, we can combine different sources of data for the purposes of prediction. We illustrate our approach by combining geo-located Twitter data for more than two million users (2010-2016) with data from the American Community Survey (ACS) to estimate state-level emigration in the United States. We propose a joint model that draws from both ACS and Twitter data by modeling the spatial and temporal correlation structure of Twitter biases. We show that while Twitter-based estimates are upwardly biased, when these estimates are combined with ACS estimates, the resulting predictions of internal migration flows are more accurate than predictions based on ACS data only. Our method can be used to forecast future migration flows or to fill in missing time periods for which survey estimates are not available. Finally, our model is flexible and can be extended to incorporate multiple sources of data, such as Twitter data, cellphone records, administrative reports, and survey estimates.||internal migration||USA||Social media|
|Mazzoli M., B. Diechtiareff, A. Tugores, W. Wives, N. Adler, P. Colet, J.J. Ramasco, J. Paniagua||2020||Migrant mobility flows characterized with digital data||PloS one, 2020, Vol.15 (3), p.e0230264-e0230264||Monitoring migration flows is crucial to respond to humanitarian crisis and to design efficient policies. This information usually comes from surveys and border controls, but timely accessibility and methodological concerns reduce its usefulness. Here, we propose a method to detect migration flows worldwide using geolocated Twitter data. We focus on the migration crisis in Venezuela and show that the calculated flows are consistent with official statistics at country level. Our method is versatile and far-reaching, as it can be used to study different features of migration as preferred routes, settlement areas, mobility through several countries, spatial integration in cities, etc. It provides finer geographical and temporal resolutions, allowing the exploration of issues not contemplated in official records. It is our hope that these new sources of information can complement official ones, helping authorities and humanitarian organizations to better assess when and where to intervene on the ground.||international migration||Venezuela||Social media|
|DeWaard J., J.E. Johnson, S.D. Whitaker||2020||Out-migration from and return migration to Puerto Rico after Hurricane Maria: evidence from the consumer credit panel||Population and environment, 2020-09-01, Vol.42 (1), p.28-42||In this research brief, we contribute to a much-needed, initial, and growing inventory of data on Puerto Rican migration after Hurricane Maria. Using data from the Federal Reserve Bank of New York/Equifax Consumer Credit Panel, we provide a detailed account of out-migration from and return migration to Puerto Rico in the quarters and years after Hurricane Maria. We show that out-migration from Puerto Rico was and remains elevated after Hurricane Maria, particularly for more vulnerable places with respect to water area and especially substandard housing. We also show that return migration to Puerto Rico by the second quarter of 2019 is low, 12–13%, with those emigrating from relatively more vulnerable places returning to the island at comparably higher levels than those from less vulnerable places. Taken together, our results help to round out a small, but growing body of research on migration after Hurricane Maria and other extreme weather events.||internal migration / population displacement||USA (Puerto Rico)||Survey (Consumer Credit Panel)|
|Böhme M.H., A. Gröger, T. Stöhr||2020||Searching for a better life: Predicting international migration with online search keywords||Journal of Development Economics|
Volume 142, January 2020, 102347
|Migration data remains scarce, particularly in the context of developing countries. We demonstrate how geo-referenced online search data can be used to measure migration intentions in origin countries and to predict bilateral migration flows. Our approach provides strong additional predictive power for international migration flows when compared to reference models from the migration and trade literature. We provide evidence, based on survey data, that our measures partly reflect genuine migration intentions and that they outperform any of the established predictors of migration flows in terms of predictive power, especially in the bilateral within dimension. Our findings contribute to the literature by (1) providing a novel way for the measurement of migration intentions, (2) allowing real-time predictions of current migration flows ahead of official statistics, and (3) improving the performance of conventional models of migration flows.||international migration||OECD Member States||Search engine(s)|
|Wanner P.||2020||How well can we estimate immigration trends using Google data?||Quality & quantity, 2020, Vol.55 (4), p.1181||For a country to efficiently monitor international migration, quick access to information on migration flows is helpful. However, traditional data sources fail to provide immediate information on migration flows and do not facilitate the correct anticipation of these flows in the short term. To tackle this issue, this paper evaluates the predictive capacity of big data to estimate the current level or to predict short-term flows. The results show that Google Trends can provide information that reflects the attractiveness of Switzerland for to immigrants from different countries and predict, to some extent, current and future (short-term) migration flows of adults arriving from Spain or Italy. However, the predictions appear not to be satisfactory for other flows (from France and Germany). Additional studies based on alternative approaches are needed to validate or overturn our study results.||international migration||Switzerland||Search engine(s)|
|Palotti J., N. Adler, A. Morales-Guzman, J. Villaveces, V. Sekara, M. Garcia Herranz, M. Al-Asad, I. Weber||2020||Monitoring of the Venezuelan exodus through Facebook’s advertising platform||PLOS ONE 15(2): e0229175||Venezuela is going through the worst economical, political and social crisis in its modern history. Basic products like food or medicine are scarce and hyperinflation is combined with economic depression. This situation is creating an unprecedented refugee and migrant crisis in the region. Governments and international agencies have not been able to consistently leverage reliable information using traditional methods. Therefore, to organize and deploy any kind of humanitarian response, it is crucial to evaluate new methodologies to measure the number and location of Venezuelan refugees and migrants across Latin America. In this paper, we propose to use Facebook’s advertising platform as an additional data source for monitoring the ongoing crisis. We estimate and validate national and sub-national numbers of refugees and migrants and break-down their socio-economic profiles to further understand the complexity of the phenomenon. Although limitations exist, we believe that the presented methodology can be of value for real-time assessment of refugee and migrant crises world-wide.||international migration / population displacement||Venezuela||Social media|
|Blumenstock J., G. Chi, X. Tan||2019||Migration and the Value of Social Networks||CEPR Discussion Paper No. DP13611||What is the value of a social network? Prior work suggests two distinct mechanisms that have historically been difficult to differentiate: as a conduit of information, and as a source of social and economic support. We use a rich 'digital trace' dataset to link the migration decisions of millions of individuals to the topological structure of their social networks. We find that migrants systematically prefer 'interconnected' networks (where friends have common friends) to 'expansive' networks (where friends are well connected). A micro-founded model of network-based social capital helps explain this preference: migrants derive more utility from networks that are structured to facilitate social support than from networks that efficiently transmit information.||internal migration / human mobility||Rwanda||MNO|
|Hankaew S., S. Phithakkitnukoon, M.G. Demissie, L. Kattan, Z. Smoreda, C. Ratti||2019||Inferring and Modeling Migration Flows Using Mobile Phone Network Data||IEEE access, 2019, Vol.7, p.164746-164758||Estimating migration flows and forecasting future trends is important, both to understand the causes and effects of migration and to implement policies directed at supplying particular services. Over the years, less research has been done on modeling migration flows than the efforts allocated to modeling other flow types, for instance, commute. Limited data availability has been one of the major impediments for empirical analyses and for theoretical advances in the modeling of migration flows. As a migration trip takes place much less frequent compared to the commute, it requires a longitudinal set of data for the analysis. This study makes use a massive mobile phone network data to infer migration trips and their distribution. Insightful characteristics of the inferred migration trips are revealed, such as intra/inter-district migration flows, migration distance distribution, and origin-destination (O-D) movements. For migration trip distribution modelling, log-linear model, traditional gravity model, and recently introduced radiation model were examined with different approaches taken in defining parameters for each model. As the result, the gravity and log-linear models with a direct distance (displacement) used as its travel cost and district centroids used as the reference points perform best among the other alternative models. A radiation model that considers district population performs best among the radiation models, but worse than that of the gravity and log-linear models.||internal migration||Portugal||MNO|
|Lai S., E. Zu Erbach-Schoenberg, C. Pezzulo, N.W. Ruktanonchai, A. Sorichetta, J. Steele, T. Li, C.A. Dooley, A.J. Tatem||2019||Exploring the use of mobile phone data for national migration statistics||Palgrave Commun|
. 2019 Mar 26;5:34
|Statistics on internal migration are important for keeping estimates of subnational population numbers up-to-date as well as urban planning, infrastructure development and impact assessment, among other applications. However, migration flow statistics typically remain constrained by the logistics of infrequent censuses or surveys. The penetration rate of mobile phones is now high across the globe with rapid recent increases in ownership in low-income countries. Analysing the changing spatiotemporal distribution of mobile phone users through anonymized call detail records (CDRs) offers the possibility to measure migration at multiple temporal and spatial scales. Based on a dataset of 72 billion anonymized CDRs in Namibia from October 2010 to April 2014, we explore how internal migration estimates can be derived and modelled from CDRs at subnational and annual scales, and how precision and accuracy of these estimates compare to census-derived migration statistics. We also demonstrate the use of CDRs to assess how migration patterns change over time, with a finer temporal resolution compared to censuses. Moreover, we show how gravity-type spatial interaction models built using CDRs can accurately capture migration flows. Results highlight that estimates of migration flows made using mobile phone data is a promising avenue for complementing more traditional national migration statistics and obtaining more timely and local data.||internal migration||Namibia||MNO|
|Alexander M., K. Polimis, E. Zagheni||2019||The impact of Hurricane Maria on out-migration from Puerto Rico: evidence from Facebook data||Population and Development Review, 45:3, 617–630 (2019)||Natural disasters such as hurricanes can cause substantial population out-migration. However, the magnitude of population movements is difficult to estimate using only traditional sources of migration data. We utilize data obtained from Facebook's advertising platform to estimate out-migration from Puerto Rico in the months after Hurricane Maria. We find evidence to indicate a 17.0% increase in the number of Puerto Rican migrants present in the US over the period October 2017 to January 2018. States with the biggest increases were Florida, New York and Pennsylvania, and there were disproportionately larger increases in the 15-30 age groups and for men compared to women. Additionally, we find evidence of subsequent return migration to Puerto Rico over the period January 2018 to March 2018. These results illustrate the power of complementing social media and traditional data to monitor demographic indicators over time, particularly after a shock, such as a natural disaster, to understand large changes in population characteristics.||international migration / population displacement||USA (Puerto Rico)||Social media|
|Del Fava E., A. Wisniowski, E. Zagheni||2019||Modeling international migration flows by integrating multiple data sources||SocArXiv, originally posted on: 14 November 2019 (2019), unpublished||Migration has become a significant source of population change at the global level, with broad societal implications. Although understanding the drivers of migration is critical to enacting effective policies, theoretical advances in the study of migration processes have been limited by the lack of data on flows of migrants, or by the fragmented nature of these flows. In this paper, we build on existing Bayesian modeling strategies to develop a statistical framework for integrating different types of data on migration flows. We offer estimates, as well as associated measures of uncertainty, for immigration, emigration, and net migration flows among 31 European countries, by combining administrative and household survey data from 2002 to 2015.|
Substantively, we document the historical impact of the EU enlargement and the free movement of workers in Europe on migration flows.
Methodologically, our approach improves on the Integrated Modeling of European Migration (IMEM) framework by providing a robust statistical framework for evaluating recent migration trends that is flexible enough to be further extended to incorporate new data sources, like social media.
|international migration||EU Member States, Iceland, Norway, Switzerland, the United Kingdom||Registers / Survey (Labour Force Survey)|
|Righi A.||2019||Assessing migration through social media: a review||Mathematical Population Studies, Volume 26, 2019 - Issue 2: Methods for Big Data in Social Sciences||Social media can be used not only for evaluating migration flows almost in real time and the degree of integration in the destination countries but also for the understanding of public opinion sentiment about immigration. Experiences based on scraping social media are reviewed, and the use of geo-located data and advertising platforms turns out to be the most promising opportunities supplied by these sources. The current challenge is to measure the sentiment of Italian-speaking twitterers toward migration.||international migration||not applicable||Social media|
|Spyratos S., M. Vespe, F. Natale, I. Weber, E. Zagheni, M. Rango||2019||Quantifying international human mobility patterns using Facebook Network data||PLOS ONE 14(10): e0224134||Quantifying global international mobility patterns can improve migration governance. Despite decades of calls by the international community to improve international migration statistics, the availability of timely and disaggregated data about long-term and short-term migration at the global level is still very limited. In this study, we investigate the feasibility of using non-traditional data sources to fill existing gaps in migration statistics. To this end, we use anonymised and publicly available data provided by Facebook’s advertising platform. Facebook’s advertising platform classifies its users as “lived in country X” if they previously lived in country X, and now live in a different country. Drawing on statistics about Facebook Network users (Facebook, Instagram, Messenger, and the Audience Network) who have lived abroad and applying a sample bias correction method, we estimate the number of Facebook Network (FN) “migrants” in 119 countries of residence and in two time periods by age, gender, and country of previous residence. The correction method estimates the probability of a person being a FN user based on age, sex, and country of current and previous residence. We further estimate the correlation between FN-derived migration estimates and reference official migration statistics. By comparing FN-derived migration estimates in two different time periods, January-February and August-September 2018, we successfully capture the increase in Venezuelan migrants in Colombia and Spain in 2018. FN-derived migration estimates cannot replace official migration statistics, as they are not representative, and the exact methods the FN uses for classifying its users are not known, and might change over time. However, after carefully assessing the validity of the FN-derived estimates by comparing them with data from reliable sources, we conclude that these estimates can be used for trend analysis and early-warning purposes.||international migration||World||Social media (Facebook)|
|Cesare N., H. Lee, T. McCormick, E. Spiro, E. Zagheni||2018||Promises and Pitfalls of Using Digital Traces for Demographic Research||Demography, 2018-10-01, Vol.55 (5), p.1979-1999||The digital traces that we leave online are increasingly fruitful sources of data for social scientists, including those interested in demographic research. The collection and use of digital data also presents numerous statistical, computational, and ethical challenges, motivating the development of new research approaches to address these burgeoning issues. In this article, we argue that researchers with formal training in demography—those who have a history of developing innovative approaches to using challenging data—are well positioned to contribute to this area of work. We discuss the benefits and challenges of using digital trace data for social and demographic research, and we review examples of current demographic literature that creatively use digital trace data to study processes related to fertility, mortality, and migration. Focusing on Facebook data for advertisers—a novel “digital census” that has largely been untapped by demographers—we provide illustrative and empirical examples of how demographic researchers can manage issues such as bias and representation when using digital trace data. We conclude by offering our perspective on the road ahead regarding demography and its role in the data revolution.||international migration / internal migration / human mobility||not applicable||Social media (Facebook, LinkedIn)|
|Rayer S.||2018||Estimating the Migration of Puerto Ricans to Florida Using Flight Passenger Data||Bureau of Economic and Business Research, Population Studies||Florida has been a primary destination for migrants from Puerto Rico for many years. Hurricane Maria, which made landfall in Puerto Rico on September 20, 2017, caused extensive damage on the island. In its aftermath, thousands of Puerto Ricans have moved to the U.S. mainland, many of them to Florida. Estimating the size of this inflow to Florida is no easy task, given that no direct measures of migration are available. Various indicators such as flight passenger arrivals, individuals served at multi-agency resource centers, school enrollments, FEMA applications, U.S. postal service address changes, and mobile phone data have been analyzed, often leading to quite different estimates. In this paper, the feasibility of using flight passenger data to estimate net migration between Puerto Rico and Florida is explored. The study compares historical flight passenger data to migration estimates from the American Community Survey (ACS). This is followed by a more detailed analysis of the flight passenger flows between Puerto Rico and Florida since Hurricane Maria; flight passenger flows between Puerto Rico and airports in other states on the U.S. mainland are also briefly examined. The study finds that – with some caveats – the flight passenger data may indeed be useful for estimating the hurricane-induced migration from Puerto Rico to Florida. Based on the available evidence, it is estimated that about 30,000 to 50,000 Puerto Ricans moved to Florida in the aftermath of Hurricane Maria.||internal migration / population displacement||USA (Puerto Rico)||Others (flight passengers data)|
|Chow T.E., R.T. Schuermann, A.H. Ngu, K.R. Dahal||2018||Spatial mining of migration patterns from web demographics||International Journal of Geographical Information ScienceVolume 32, Issue 10, Pages 1977 - 1998||Volunteered Geographic Information, social media, and data from Information and Communication Technology are emerging sources of big data that contribute to the development and understanding of the spatiotemporal distribution of human population. However, the inherent anonymity of these crowd-sourced or crowd-harvested data sources lack the socioeconomic and demographic attributes to examine and explain human mobility and spatiotemporal patterns. In this paper, we investigate an Internet-based demographic data source, personal microdata databases publicly accessible on the World Wide Web (hereafter web demographics), as potential sources of aspatial and spatiotemporal information regarding the landscape of human dynamics. The objectives of this paper are twofold: (1) to develop an analytical framework to identify mobile population from web demographics as an individual-level residential history data, and (2) to explore their geographic and demographic patterns of migration. Using web demographics of Vietnamese–Americans in Texas collected in 2010 as a case study, this paper (1) addresses entity resolution and identifies mobile population through the application of a Cost-Sensitive Alternative Decision Tree (CS-ADT) algorithm, (2) investigates migration pathways and clusters to include both short- and long-distance patterns, and (3) analyze the demographic characteristics of mobile population and the functional relationship with travel distance. By linking the physical space at the individual level, this unique methodology attempts to enhance the understanding of human movement at multiple spatial scales.||international migration / human mobility||USA (Texas)||Web demographics|
|Ma T., R. Lu, N. Zhao, S.-L. Shaw||2018||An estimate of rural exodus in China using location-aware data||PLOS ONE 13(7): e0201458||The rapidly developing economy and growing urbanization in China have created the largest rural-to-urban migration in human history. Thus, a comprehensive understanding of the pattern of rural flight and its prevalence and magnitude over the country is increasingly important for sociological and political concerns. Because of the limited availability of internal migration data, which was derived previously from the decennial population census and small-scale household survey, we could not obtain timely and consistent observations for rural depopulation dynamics across the whole country. In this study, we use aggregate location-aware data collected from mobile location requests in the largest Chinese social media platform during the period of the 2016 Chinese New Year to conduct a nationwide estimate of rural depopulation in China (in terms of the grid cell-level prevalence and the magnitude) based on the world’s largest travel period. Our results suggest a widespread rural flight likely occurring in 60.2% (36.5%-81.0%, lower-upper estimate) of rural lands at the grid cell-level and covering ~1.55 (1.48–1.94) million villages and hamlets, most of China’s rural settlement sites. Moreover, we find clear regional variations in the magnitude and spatial extent of the estimated rural depopulation. These variations are likely connected to regional differences in the size of the source population, largely because of the nationwide prevalence of rural flight in today’s China. Our estimate can provide insights into related investigations of China’s rural depopulation and the potential of increasingly available crowd-sourced data for demographic studies.||internal migration||China||Social media|
|Yuan M.||2018||Human dynamics in space and time: A brief history and a view forward.||Transactions in GIS. Aug2018, Vol. 22 Issue 4, p900-912||This article highlights the key intellectual development in human dynamics research, examines the modeling emphases in publications, and argues for research directions in need. Human dynamics research is discussed in two broad directions: spacing time and timing space, to model human activities and interactions. Time is essential to human dynamics research. Space, while often being overlooked, in complement with time is critical to understanding human dynamics because knowing where activities take place is essential to knowing how and why people act and interact. Some interactions allow remote or asynchronized participations, and others require movement to collocate individuals for participating in synchronized activities. A spacing time approach examines the temporal gaps between interactions. A timing space approach investigates the spatial pulses between interactions. Primary research in the spacing time of human dynamics established queueing theories to explain the bursts and heavy‐tailed distribution of human interactions. Although research on the timing space of human dynamics enjoys growing popularity with data from geo‐tagged social media and location‐aware social internet of things (SIoT), its publications remain mostly exploratory. This article suggests a hierarchical framework to systematically study human dynamics and relate findings to build the body of knowledge about human dynamics.||international migration / internal migration / human mobility||not applicable||MNO / Social media|
|Carammia M., S.M. Iacus, T. Wilkin||2020||Forecasting asylum-related migration flows with machine learning and data at scale||arXiv:2011.04348 [stat.AP]||The effects of the so-called "refugee crisis" of 2015-16 continue to dominate the political agenda in Europe. Migration flows were sudden and unexpected, leaving governments unprepared and exposing significant shortcomings in the field of migration forecasting. Migration is a complex system typified by episodic variation, underpinned by causal factors that are interacting, highly context dependent and short-lived. Correspondingly, migration monitoring relies on scattered data, while approaches to forecasting focus on specific migration flows and often have inconsistent results that are difficult to generalise at the regional or global levels.|
Here we show that adaptive machine learning algorithms that integrate official statistics and non-traditional data sources at scale can effectively forecast asylum-related migration flows. We focus on asylum applications lodged in countries of the European Union (EU) by nationals of all countries of origin worldwide; the same approach can be applied in any context provided adequate migration or asylum data are available.
We exploit three tiers of data - geolocated events and internet searches in countries of origin, detections of irregular crossings at the EU border, and asylum recognition rates in countries of destination - to effectively forecast individual asylum-migration flows up to four weeks ahead with high accuracy. Uniquely, our approach a) monitors potential drivers of migration in countries of origin to detect changes early onset; b) models individual country-to-country migration flows separately and on moving time windows; c) estimates the effects of individual drivers, including lagged effects; d) provides forecasts of asylum applications up to four weeks ahead; e) assesses how patterns of drivers shift over time to describe the functioning and change of migration systems.
|international migration||EU Member States, Norway, Switzerland, the United Kingdom||Search engine(s) (Google) / Registers (irregular border crossing, asylum applications) / Others (Global Database of Events, Language, and Tone)|
|Fiorio L., G. Abel, J. Cai, E. Zagheni, I. Weber, G. Vinue||2017||Using Twitter Data to Estimate the Relationships between Short-term Mobility and Long-term Migration||Long Session III: Talking, Thinking, and Living Online, WebSci '17, June 25-28, 2017, Troy, NY, USA. Web|
Science 2017 103-110
|Migration estimates are sensitive to denitions of time interval and duration. For example, when does a tourist become a migrant? As a result, harmonizing across dierent kinds of estimates or data sources can be dicult. Moreover in countries like the United States, that do not have a national registry system, estimates of internal migration typically rely on survey data that can require over a year from data collection to publication. In addition, each survey can ask only a limited set questions about migration (e.g., where did you live a year ago? where did you live ve years ago?). We leverage a sample of geo-referenced Twier tweets for about 62,000 users, spanning the period between 2010 and 2016, to estimate a series of US internal migration ows under varying time intervals and durations. Our ndings, expressed in terms of ‘migration curves’, document, for the rst time, the relationships between short-term mobility and long-term migration. e results open new avenues for demographic research. More specically, future directions include the use of migration curves to produce probabilistic estimates of long-term migration from short-term (and vice versa) and to nowcast mobility rates at dierent levels of spatial and temporal granularity using a combination of previously published American Community Survey data and up-to-date data from a panel of Twitter users.||internal migration||USA||Social media (Twitter)|
|Zagheni E., I. Weber, K. Gummadi||2017||Leveraging Facebook’s Advertising Platform to Monitor Stocks of Migrants||Population and Development Review 43(4):721-734||international migration||USA (California, Texas)||Social media (Facebook)|
|Connor P.||2017||Can Google Trends Forecast International Migration Flows? Perhaps, but Under Certain Conditions||Presented at Population Association of America meetings|
|Big data have been used to forecast a variety of human behaviors, including migration. This paper uses the case of recent Syrian and Iraqi refugees entering Europe to explore whether online search data can be used to forecast forced migration. Comparisons between Google Trends and arrival/asylum seeker data demonstrate that online search activity might be a useful method for forecasting forced migration when the online search data can be specified for migratory population with high technology access. Additionally, there must also be few barriers to migration (migrant resources, border controls) for online search data to predict migration.||international migration / population displacement||EU Member States, Iraq, Sirya||Search engine(s)|
|Kamenjuk P., A. Aasa, J. Sellin||2017||Mapping changes of residence with passive mobile positioning data: the case of Estonia||International Journal of Geographical Information Science, 31:7, 1425-1447||Similar to every process involving quantitative research, the study of migration heavily depends on the data available for analysis. The available movement data limit the type of questions that can be asked, and as a result, certain aspects of human spatial mobility have yet to be examined. The development of information and communication technologies and their widespread adoption offers new datasets, methods and interpretations that make it possible to study social processes at a new level. For example, mobile positioning data can aid in overcoming certain constraints embedded in traditional data sources (such as censuses or questionnaires) for study of the connections between daily mobility and change of residence. This study presents a framework for mapping changes of residence using data from passive mobile positioning and an anchor point model to better understand the limits of these methods and their contribution to understanding long-term mobility. The study concludes that the most important considerations in monitoring change of residence using passive mobile position data include the continuity of the time-series data, the varying structure of the mobile tower network and the diversified nature of human mobility. The fine spatial and temporal granularities of passive mobile positioning data allow us to study human movement at a detailed scale.||internal migration / human mobility||Estonia||MNO|
|Hayes B.||2017||Migration and data protection: Doing no harm in an age of mass displacement, mass surveillance and “big data”||International Review of the Red Cross (2017), 99 (1), 179–209.||This article considers the key data protection challenges facing humanitarian organizations providing assistance to refugees, internally displaced persons and migrants. These challenges are particularly significant for several reasons: because data protection has come relatively late to the humanitarian sector; because humanitarian organizations are under pressure to innovate rapidly; because the global communications architecture on which many of these innovations depend is inherently vulnerable to State surveillance; and because States are deploying increasingly sophisticated and coercive means to prevent irregular forms of migration and/or subjecting humanitarian organizations to surveillance and disruption. The first part of the article outlines the fundamental rights challenges presented by contemporary data-driven migration control paradigms. The second outlines concerns about “data-driven humanitarianism” and “mass surveillance” to show how humanitarian organizations risk inadvertently exacerbating these problems. The third assesses specific data protection challenges that humanitarian organizations face and the policies and practices they have developed in response. The article concludes with some brief observations on the technical and political dynamics shaping their efforts to comply with their legal and ethical obligations, and calls for the sector to work together to extend data protection norms and outlaw cyber-attacks by State actors.||international migration||not applicable||Others (Big Data in general)|
|Hughes C., E. Zagheni, G. Abel, A. Wisniowski, A. Sorichetta, I. Weber, A.J. Tatem||2016||Inferring migrations, traditional methods and new approaches based on mobile phone, social media, and other big data: feasibility study on inferring (labour) mobility and migration in the European Union from big data and social media data||This report addresses the question of whether it is technically, financially and legally feasible to estimate geographic mobility and migration flows in the European Union. Our assessment indicates that the feasibility is dependent on a number of factors: 1. It depends on the data that one can have access to. Some data sources can be accessed by anyone with the appropriate technical skills (e.g., samples of Twitter tweets); some can be purchased (e.g., historical tweets); some are not for sale and require partnerships with companies (e.g., Yahoo!, Facebook, LinkedIn, and mobile phone providers); some are not shared by companies (Google does not share data, except for some aggregate indexes, like the ones in Google Trends). 2. It depends on the outcome desired. Estimating trends or changes in trends in migration flows is feasible and can be done in a timely manner. Getting accurate and precise estimates for special populations, like refugees, may or may not be feasible depending on the context: it would require further research. Likewise, obtaining estimates of short-term migration by education, gender or employment status is feasible. Obtaining unbiased estimates of short-term mobility from a single, non-representative source would be more difficult. It may be feasible in some circumstances (e.g., when the data set is rich enough for the use of post-stratification techniques), but not in others. 3. It depends on legal obstacles. Companies may have terms and conditions or non- disclosure agreements for data sharing that may or may not include inconsistencies with the rules governing universities and funding agencies. We have not identified major issues in this area, but each individual collaboration across units would require some careful examination of the terms and conditions in order to resolve any potential lack of consistency.||international migration / internal migration / human mobility||EU Member States||MNO / Social media|
|Messias J., F. Benevenuto, I. Weber, E. Zagheni||2016||From Migration Corridors to Clusters: The Value of Google+ Data for Migration Studies||2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). Proceedings of the|
2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and
|Recently, there have been considerable efforts to use online data to investigate international migration. These efforts show that Web data are valuable for estimating migration rates and are relatively easy to obtain. However, existing studies have only investigated flows of people along migration corridors, i.e. between pairs of countries. In our work, we use data about “places lived” from millions of Google+ users in order to study migration ‘clusters’, i.e. groups of countries in which individuals have lived sequentially. For the first time, we consider information about more than two countries people have lived in. We argue that these data are very valuable because this type of information is not available in traditional demographic sources which record country-to-country migration flows independent of each other. We show that migration clusters of country triads cannot be identified using information about bilateral flows alone. To demonstrate the additional insights that can be gained by using data about migration clusters, we first develop a model that tries to predict the prevalence of a given triad using only data about its constituent pairs. We then inspect the groups of three countries which are more or less prominent, compared to what we would expect based on bilateral flows alone. Next, we identify a set of features such as a shared language or colonial ties that explain which triple of country pairs are more or less likely to be clustered when looking at country triples. Then we select and contrast a few cases of clusters that provide some qualitative information about what our data set shows. The type of data that we use is potentially available for a number of social media services. We hope that this first study about migration clusters will stimulate the use of Web data for the development of new theories of international migration that could not be tested appropriately before.||international migration||World||Search engine(s) (Google)|
|Ashton W., P. Bhattacharyya, E. Galatsanou, S. Ogoe, L. Wilkinson||2016||Emerging Uses of Big Data in Immigration Research||Submitted to the Social Sciences and Humanities Research Council of Canada, Grant No. 421-2015-2036||international migration||Canada||Others (Big Data in general)|
|Yuan H., C. Zhu||2016||Shock and roam: Migratory responses to natural disasters||Economics letters, 2016-11, Vol.148, p.37-40||Using novel data on roaming mobile phones and a synthetic control method, we find out-migration in the area affected by the 2014 Ludian earthquake in Southwest China. The induced emigration emerged within a few weeks after the earthquake and persisted for months. We find no evidence that the earthquake drew back migrants who, prior to the earthquake, had emigrated to Guangdong province, which is a manufacturing hub and the primary destination of rural migrant workers in China.||internal migration||China||MNO|
|Zagheni E., I. Weber||2015||Demographic research with non-representative internet data||International Journal of Manpower|
Vol. 36 No. 1, 2015
Internet data hold many promises for demographic research, but come with severe drawbacks due to several types of bias. The purpose of this paper is to review the literature that uses internet data for demographic studies and presents a general framework for addressing the problem of selection bias in non-representative samples.
The authors propose two main approaches to reduce bias. When ground truth data are available, the authors suggest a method that relies on calibration of the online data against reliable official statistics. When no ground truth data are available, the authors propose a difference in differences approach to evaluate relative trends.
The authors offer a generalization of existing techniques. Although there is not a definite answer to the question of whether statistical inference can be made from non-representative samples, the authors show that, when certain assumptions are met, the authors can extract signal from noisy and biased data.
The methods are sensitive to a number of assumptions. These include some regularities in the way the bias changes across different locations, different demographic groups and between time steps. The assumptions that we discuss might not always hold. In particular, the scenario where bias varies in an unpredictable manner and, at the same time, there is no “ground truth” available to continuously calibrate the model, remains challenging and beyond the scope of this paper.
The paper combines a critical review of existing substantive and methodological literature with a generalization of prior techniques. It intends to provide a fresh perspective on the issue and to stimulate the methodological discussion among social scientists.
|international migration / internal migration / human mobility||not applicable||Others (Big Data in general)|
|Zagheni E., V. R. Kiran Garimella, I. Weber, B. State||2014||Inferring International and Internal Migration Patterns from Twitter Data||WWW ’14 Companion, April 7-11, 2014, Seoul, Korea||Data about migration flows are largely inconsistent across countries, typically outdated, and often inexistent. Despite the importance of migration as a driver of demographic change, there is limited availability of migration statistics. Generally, researchers rely on census data to indirectly estimate flows. However, little can be inferred for specific years between censuses and for recent trends. The increasing availability of geolocated data from online sources has opened up new opportunities to track recent trends in migration patterns and to improve our understanding of the relationships between internal and international migration. In this paper, we use geolocated data for about 500,000 users of the social network website “Twitter”. The data are for users in OECD countries during the period May 2011- April 2013. We evaluated, for the subsample of users who have posted geolocated tweets regularly, the geographic movements within and between countries for independent periods of four months, respectively. Since Twitter users are not representative of the OECD population, we cannot infer migration rates|
at a single point in time. However, we proposed a difference-indifferences approach to reduce selection bias when we infer trends in out-migration rates for single countries. Our results indicate that our approach is relevant to address two longstanding questions in the migration literature. First, our methods can be used to predict turning points in migration trends, which are particularly relevant for migration forecasting. Second, geolocated Twitter data can substantially improve our understanding of the relationships between internal and international migration. Our analysis relies uniquely on publicly available data that could be potentially available in real time and that could be used to monitor migration trends. The Web Science community is well-positioned to address, in future work, a number of methodological and substantive questions that we discuss in this article.
|international migration / internal migration||OECD Member States||Social media (Twitter)|
|State B., M. Rodriguez, D. Helbing, E. Zagheni||2014||Migration of Professionals to the US. Evidence from LinkedIn Data||Proceedings of SocInfo 2014. Springer’s Lecture|
Note Series in Computer Science, 531-543
|We investigate trends in the international migration of professional workers by analyzing a dataset of millions of geolocated career histories provided by LinkedIn, the largest online platform for professionals. The new dataset confirms that the United States is, in absolute terms, the top destination for international migrants. However, we observe a decrease, from 2000 to 2012, in the percentage of professional migrants, worldwide, who have the United States as their country of destination. The pattern holds for persons with Bachelor’s, Master’s, and PhD degrees alike, and for individuals with degrees from highly-ranked worldwide universities. Our analysis also reveals the growth of Asia as a major professional migration destination during the past twelve years. Although we see a decline in the share of employment-based migrants going to the United States, our results show a recent rebound in the percentage of international students who choose the United States as their destination.||international migration||World||Social media (LinkedIn)|
|Laszlo F., M. Rango||2014||Can Big Data help us achieve a “migration data revolution”?||MIGRATION POLICY PRACTICE, Vol. IV, Number 2, April–June 2014||international migration / internal migration||not applicable||Others (Big Data in general)|
|Schuermann R.T., T.E. Chow||2014||Geovisualization of Local and Regional Migration Using Web Demographics||International archives of the photogrammetry, remote sensing and spatial information sciences., 2014-01-01, Vol.XL (2), p.93-97||The intent of this research was to augment and facilitate analyses, which gauges the feasibility of web-mined demographics to study spatio-temporal dynamics of migration. As a case study, we explored the spatio-temporal dynamics of Vietnamese Americans (VA) in Texas through geovisualization of mined demographic microdata from the World Wide Web. Based on string matching across all demographic attributes, including full name, address, date of birth, age and phone number, multiple records of the same entity (i.e. person) over time were resolved and reconciled into a database. Migration trajectories were geovisualized through animated sprites by connecting the different addresses associated with the same person and segmenting the trajectory into small fragments. Intra-metropolitan migration patterns appeared at the local scale within many metropolitan areas. At the scale of metropolitan area, varying degrees of immigration and emigration manifest different types of migration clusters. This paper presents a methodology incorporating GIS methods and cartographic design to produce geovisualization animation, enabling the cognitive identification of migration patterns at multiple scales. Identification of spatio-temporal patterns often stimulates further research to better understand the phenomenon and enhance subsequent modeling.||internal migration / human mobility||USA (Texas)||Web demographics|
|David B., D. Dana, F. Abel||2013||On the effect of mobile phone on migrant remittances: A closer look at international transfers||Electronic commerce research and applications, 2013-07, Vol.12 (4), p.280-288||Recent empirical studies based on surveys bring evidence that international remittances are more the result of familial intertemporal contracts than self-insurance motivations. Exploiting transaction-level remittance data carried out by 3294 migrants between 2004 and 2009 in France from a mobile money transfer service to recipients located in Sub-Sahara Africa, Middle East, Eastern Europe and Madagascar, we find using descriptive statistics and econometric tests that migrants send preferably more money to themselves than to family and non-family members. This result tends to support the idea that the mobile technology impacts migrant remittances and then the standard findings in the remittance literature as migrants seem to be more concerned by the accumulation of savings (self-insurance motivations) than about altruistic or household insurance motivations.||international migration||France||Others (mobile money transfer service)|
|State B., I. Weber, E. Zagheni||2013||Studying Inter-National Mobility through IP Geolocation||WSDM’13, February 4–8, 2013, Rome, Italy. Proceedings of ACM Web Search and Data Mining. WWW Companion 2014 439-444||The increasing ubiquity of Internet use has opened up new avenues in the study of human mobility. Easily-obtainable geolocation data resulting from repeated logins to the same website offer the possibility of observing long-term patterns of mobility for a large number of individuals. We use data on the geographic locations from where over 100 million anonymized users log into Yahoo! services to generate the first global map of short- and medium-term mobility flows. We develop a protocol to identify anonymized users who, over a one-year period, had spent more than 3 months in a different country from their stated country of residence (“migrants”), and users who spent less than a month in a country different from their country of residence (“tourists”). We compute aggregate estimates of migration probabilities between countries, as inferred from a user’s location over the observed period. Geolocation data allow us to characterize also the pendularity of migration flows – i.e., the extent to which migrants travel back and forth between their countries of origin and destination. We use data regarding visa regimes, colonial ties, geographic location and economic development to predict migration and tourism flows. Our analysis shows the persistence of traditional migration patterns as well as the emergence of new routes. Migrations tend to be more pendular between countries that are close to each other. We observe particularly high levels of pendularity within the European Economic Area, even after we control for distance and visa regimes. The dataset, methodology and results presented have important implications for the travel industry, as well as for several disciplines in social sciences, including geography, demography and the sociology of networks.||international migration||World||Others (Yahoo! users)|
|Sirbu A., L. Pollacci, J. Kim, G. Rossetti||2021||Report on the developed indicators for nowcasting stock migration by Twitter data||Leuven: HumMingBird project 870661 – H2020, Deliverable 5.1||Measuring migration stocks and flows over time in various countries is crucial but challenging. Migration-related information has important implications for effective policy design and for under-standing broader population trends. Researchers and policymakers mostly rely on official statistics and administrative data. However, these data typically show numerous drawbacks, e.g., low time and space resolution (i.e. the measurements are not frequent in time and are usually aggregated at high level geographical regions), inconsistency between different countries (different reporting standards, different definitions, varying data quality and collection methods) and delays. This explains why the current availability of data from social media, like Twitter, has offered new opportunities to attempt to obtain more updated information and estimates and to improve and integrate traditional data sources (Sirbu et al., 2020). Social media datasets contain several types of user information, cover large population groups, even across multiple nations, and are often available cheaply and on time. Both traditional and novel data are currently employed to study different aspects of migration, such as the economic and cultural effects connected with migrants, monitoring flows, and estimating stocks.|
Among the various types of social big data, user-generated content from Twitter can be a valuable resource in migration studies. This has been proven by recent works using Twitter data to study various migration-related problems (Zagheni et al., 2014; Mazzoli et al., 2020; Lenormand et al., 2015; Moise et al., 2016; Valle et al., 2017). However, data collection, pre-processing and analysis is far from straightforward and can result in biased data that might influence the final results. Bias comes from various sources. One is related to sampling bias introduced by the way individuals use Twitter: there is a selection bias in the general Twitter population, but also when restricting the analysis to certain subsets of data, such as geolocalised Tweets. Moreover, data, being user generated content, may be very noisy, impeding their use in certain areas of research, or resulting in limited knowledge after cleaning. It may also contain misleading or fake information (e.g. users using a nickname, or declaring the wrong profile location). Furthermore, ethical and privacy issues need to be considered carefully, as biased results and publication of sensitive information might harm migrants.
One research question of interest is whether Twitter data can be useful to understand migrant stocks. The hypothesis is that this type of data can provide better time and space resolution, and provide more timely information compared to official statistics. Here we investigate these aspects, through various types of analyses. A first analysis, that we define as top-down, employs features extracted from the text of tweets generated by a community to estimate migration stocks, using machine learning techniques. The second approach, bottom up, labels individual users with a nation-ality and residence, using a data driven model, and estimates stocks from resulting labels. The two approaches are complementary and will be discussed in detail in the following sections. A different type of analysis based on Twitter is the study of a restricted period of time and geographical area, to analyse a specific event. We will show preliminary results for the analysis of border rush at the Turkish border in March 2021. Here we employ language as the main feature that determines nationality of users.
The report will end with a discussion of challenges, limitations and advantages that arose during our analyses, and what we believe are the benefits of employing this type of data for migration studies.
|international migration||France, Germany, Ireland, Italy, the Netherlands, Spain, the United Kingdom||Social media (Twitter)|
|Culora A., E. Thomas, E. Dufresne, M. Cefalu, C. Fays, S. Hoorens||2021||Using social media data to 'nowcast' international migration around the globe||Santa Monica, CA: RAND Corporation||The aim of this study was to develop a methodological tool to 'nowcast' migrant stocks by using real-time data from the Facebook Marketing Application Programming Interface (API) and official migration data from EU member states and states in the United States. To meet this aim, RAND researchers collected real-time data that could provide estimates of migrant stocks in the countries of interest from the Facebook Marketing API, along with migrant-stocks data from official sources from 2010 onwards, and developed a Bayesian model capable of combining the Facebook and official migration data to nowcast stocks of migrants in EU member states and US states. The model developed in this study is capable of producing near real-time nowcasts for each source of official statistics, which can serve as an early-warning system to anticipate 'shock events' and rapid migration trends that would otherwise be captured too late or not at all by official migration data sources. This tool therefore enables decision-makers to make informed, evidence-based policy decisions in the rapidly changing social policy sphere of international migration. The study also provides a useful example of how to combine 'big data' with traditional data to improve measurement and estimation which can be applied to other social and demographic phenomena. Suggestions for future work include continued data collection activities to extend the temporal overlap between Facebook data and official migration statistics, nowcasting migrant stocks for demographic subgroups, and exploring alternative specifications for the Bayesian model to improve the accuracy of the nowcasts.||international migration||Belgium, France, USA||Social media (Facebook)|
|Blumenstock J.E.||2012||Inferring patterns of internal migration from mobile phone call records: evidence from Rwanda||Information Technology for Development, 18:2, 107-125||Understanding the causes and effects of internal migration is critical to the effective design and implementation of policies that promote human development. However, a major impediment to deepening this understanding is the lack of reliable data on the movement of individuals within a country. Government censuses and household surveys, from which most migration statistics are derived, are difficult to coordinate and costly to implement, and typically do not capture the patterns of temporary and circular migration that are prevalent in developing economies. In this paper, we describe how new information and communications technologies (ICTs), and mobile phones in particular, can provide a new source of data on internal migration. As these technologies quickly proliferate throughout the developing world, billions of individuals are now carrying devices from which it is possible to reconstruct detailed trajectories through time and space. Using Rwanda as a case study, we demonstrate how such data can be used in practice. We develop and formalize the concept of inferred mobility, and compute this and other metrics on a large data set containing the phone records of 1.5 million Rwandans over four years. Our empirical results corroborate the findings of a recent government survey that notes relatively low levels of permanent migration in Rwanda. However, our analysis reveals more subtle patterns that were not detected in the government survey. Namely, we observe high levels of temporary and circular migration, and note significant heterogeneity in mobility within the Rwandan population. Our goals in this research are thus twofold. First, we intend to provide a new quantitative perspective on certain patterns of internal migration in Rwanda that are unobservable using standard survey techniques. Second, we seek to contribute to the broader literature by illustrating how new forms of ICT can be used to better understand the behavior of individuals in developing countries.||internal migration||Rwanda||MNO|
|Simini F., M.C. González, A. Maritan, A.-L. Barabási||2012||A universal model for mobility and migration patterns||Nature (London), 2012-04-05, Vol.484 (7392), p.96-100||Introduced in its contemporary form in 1946 (ref. 1), but with roots that go back to the eighteenth century2, the gravity law1,3,4 is the prevailing framework with which to predict population movement3,5,6, cargo shipping volume7 and inter-city phone calls8,9, as well as bilateral trade flows between nations10. Despite its widespread use, it relies on adjustable parameters that vary from region to region and suffers from known analytic inconsistencies. Here we introduce a stochastic process capturing local mobility decisions that helps us analytically derive commuting and mobility fluxes that require as input only information on the population distribution. The resulting radiation model predicts mobility patterns in good agreement with mobility and transport patterns observed in a wide range of phenomena, from long-term migration patterns to communication volume between different regions. Given its parameter-free nature, the model can be applied in areas where we lack previous mobility measurements, significantly improving the predictive accuracy of most of the phenomena affected by mobility and transport processes.||internal migration / human mobility||USA||MNO|
|Zagheni E., I. Weber||2012||You are where you E-mail: Using E-mail Data to Estimate International Migration Rates||Proceedings of ACM Web Science 2012||International migration is one of the major determinants of demographic change. Although efforts to produce comparable statistics are underway, estimates of demographic flows are inexistent, outdated, or largely inconsistent, for most countries. We estimate age and gender-specific migration rates using data extracted from a large sample of Yahoo! e-mail messages. Self-reported age and gender of anonymized e-mail users were linked to the geographic locations (mapped from IP addresses) from where users sent e-mail messages over time (2009-2011). The users' country of residence over time was inferred as the one from where most e-mail messages were sent. Our estimates of age profiles of migration are qualitatively consistent with existing administrative data sources. Selection bias generates uncertainty for estimates at one point in time, especially for developing countries. However, our approach allows us to compare in a reliable way migration trends of females and males. We document the recent increase in human mobility and we observe that female mobility has been increasing at a faster pace. Our findings suggest that e-mail data may complement existing migration data, resolve inconsistencies arising from different definitions of migration, and provide new and rich information on mobility patterns and social networks of migrants. The use of digital records for demographic research has the potential to become particularly important for developing countries, where the diffusion of Internet will be faster than the development of mature demographic registration systems.||international migration||World||Others (Yahoo! emails)|
|Avramescu A., A. Wiśniowsk||2021||Now-casting Romanian migration into the United Kingdom by using Google Search engine data||Demographic Research, vol 45 - article 40, p. 1219–1254||Background: Short-term forecasts of international migration are often based on data that are incomplete, biased, and reported with delays. There is also a scarcity of migration forecasts based on combined traditional and new forms of data. Objective: This research assessed an inclusive approach of supplementing ofﬁcial migration statistics, typically reported with a delay, with the so-called big data from Google searches to produce short-term forecasts (“now-casts”) of immigration ﬂows from Romania to the United Kingdom. Methods: Google Trends data were used to create composite variables depicting the general interest of Romanians in migrating into the United Kingdom. These variables were then assessed as predictors and compared with benchmark results by using univariate time series models. Results: The proposed Google Trends indices related to employment and education, which exhaust all possible keywords and eliminate language bias, match trends observed in the migration statistics. They are also capable of moderate reductions in prediction errors. Conclusions: Google Trends data have some potential to indicate up-to-date current trends of interest in mobility, which may serve as useful predictors of sudden changes in migration. However, these data do not always improve the accuracy of forecasts. The usability of Google Trends is also limited to short-term migration forecasting and requires understanding of contexts surrounding origin and destination countries. Contribution: This work provides an example on combining Google Trends and ofﬁcial migration data to produce short-term forecasts, illustrated with ﬂows from Romania to the UK. It also discusses caveats and suggests future work for using these data in migration forecasting.||international migration||Romania, the United Kingdom||Search engine(s) (Google)|
|Wesolowski A., C.O. Buckee, D.K. Pindolia, N. Eagle, D.L. Smith, A.J. Garcia, A. J. Tatem||2013||The Use of Census Migration Data to Approximate Human Movement Patterns across Temporal Scales||PLoS ONE 8(1): e52971||Human movement plays a key role in economies and development, the delivery of services, and the spread of infectious diseases. However, it remains poorly quantified partly because reliable data are often lacking, particularly for low-income countries. The most widely available are migration data from human population censuses, which provide valuable information on relatively long timescale relocations across countries, but do not capture the shorter-scale patterns, trips less than a year, that make up the bulk of human movement. Census-derived migration data may provide valuable proxies for shorter-term movements however, as substantial migration between regions can be indicative of well connected places exhibiting high levels of movement at finer time scales, but this has never been examined in detail. Here, an extensive mobile phone usage data set for Kenya was processed to extract movements between counties in 2009 on weekly, monthly, and annual time scales and compared to data on change in residence from the national census conducted during the same time period. We find that the relative ordering across Kenyan counties for incoming, outgoing and between-county movements shows strong correlations. Moreover, the distributions of trip durations from both sources of data are similar, and a spatial interaction model fit to the data reveals the relationships of different parameters over a range of movement time scales. Significant relationships between census migration data and fine temporal scale movement patterns exist, and results suggest that census data can be used to approximate certain features of movement patterns across multiple temporal scales, extending the utility of census-derived migration data.||internal migration||Kenya||MNO / Survey (census)|
|Gendronneau C., A. Wiśniowski, D. Yildiz, E. Zagheni, L. Fiorio, Y. Hsiao, M. Stepanek, I. Weber, G. Abel, S. Hoorens||2019||Measuring Labour Mobility and Migration Using Big Data - Exploring the potential of social-media data for measuring EU mobility flows and stocks of EU movers||Publications Office of the European Union||Internal freedom of movement is one of the European Union's four fundamental freedoms and is necessary for the EU single market to function. Yet official statistics on the migration of workers are constrained. They are limited in their ability to distinguish population subgroups, come with a considerable time lag of a year or more and are fully reliant on individual member states' measurements. Current data sources also tend to underestimate the overall extent of mobility by not covering short-term moves and not capturing the most recent movers.|
Given the importance of freedom of movement, it is crucial for European institutions to have robust, rich and up-to-date data to monitor it. Big data sources from social media, such as Twitter and Facebook, offer opportunities to bridge the gap between official statistics and recent migration trends.
The European Commission’s Directorate-General for Employment Social Affairs and Inclusion commissioned RAND Europe to investigate social media data's potential use for measuring EU mobility. Researchers collaborated with experts from the Vienna Institute for Demography, the University of Manchester, Washington University, Max Planck Institute for Demographic Research and the Qatar Computing Research Institute. This report discusses the activities, results and findings of this study and presents recommendations for future work in this area. It is aimed at a specialist audience of academics and policy-makers with a specific interest in measuring and monitoring migration flows.
|international migration||EU Member States||Social media (Facebook, Twitter)|
|Pollacci L., A. Sirbu, G. Rossetti||2021||Estimating highly skilled migration in Europe Part 1: scientific migration||Leuven: HumMingBird project 870661 – H2020, Deliverable 5.2||international migration||World||Others (Microsoft Academic Knowledge Graph)|